Re: Spark read csv option - capture exception in a column in permissive mode

2019-06-16 Thread Ajay Thompson
There's a column which captures the corrupted record. However, the exception isn't captured. If the exception is captured in another column it'll be very useful. On Mon, 17 Jun, 2019, 10:56 AM Gourav Sengupta, wrote: > Hi, > > it already does, I think, you just have to add the column in the

Re: Creating Spark buckets that Presto / Athena / Hive can leverage

2019-06-16 Thread Gourav Sengupta
Hi Daniel, not quite sure of this, but does Glue Data Catalogue support bucketing yet? You might want to find that out first. Regards, Gourav On Sat, Jun 15, 2019 at 1:30 PM Daniel Mateus Pires wrote: > Hi there! > > I am trying to optimize joins on data created by Spark, so I'd like to >

Re: Spark read csv option - capture exception in a column in permissive mode

2019-06-16 Thread Gourav Sengupta
Hi, it already does, I think, you just have to add the column in the schema that you are using to read. Regards, Gourav On Sun, Jun 16, 2019 at 2:48 PM wrote: > Hi Team, > > > > Can we have another column which gives the corrupted record reason in > permissive mode while reading csv. > > > >

Re: Exposing JIRA issue types at GitHub PRs

2019-06-16 Thread Hyukjin Kwon
Labels look good and useful. On Sat, 15 Jun 2019, 02:36 Dongjoon Hyun, wrote: > Now, you can see the exposed component labels (ordered by the number of > PRs) here and click the component to search. > > https://github.com/apache/spark/labels?sort=count-desc > > Dongjoon. > > > On Fri, Jun

Spark read csv option - capture exception in a column in permissive mode

2019-06-16 Thread ajay.thompson
Hi Team, Can we have another column which gives the corrupted record reason in permissive mode while reading csv. Thanks, Ajay

Re: [Pyspark 2.3+] Timeseries with Spark

2019-06-16 Thread Rishi Shah
Thanks Jorn. I am interested in timeseries forecasting for now but in general I was unable to find a good way to work with different time series methods using spark.. On Fri, Jun 14, 2019 at 1:55 AM Jörn Franke wrote: > Time series can mean a lot of different things and algorithms. Can you >

Re: Spark 2.4.3 - Structured Streaming - high on Storage Memory

2019-06-16 Thread puneetloya
Just More info on the above post: Have been seeing lot of these logs: 1) The state for version 15109(other numbers too) doesn't exist in loadedMaps. Reading snapshot file and delta files if needed...Note that this is normal for the first batch of starting query. 2) KafkaConsumer cache hitting