Re: AnalysisException - Infer schema for the Parquet path

2020-05-10 Thread Nilesh Kuchekar
Hi Chetan, You can have a static parquet file created, and when you create a data frame you can pass the location of both the files, with option mergeSchema true. This will always fetch you a dataframe even if the original file is not present. Kuchekar, Nilesh On Sat, May 9

Customize Partitioner for Datasets

2017-09-28 Thread Kuchekar
Hi, Is there a way we can customize the partitioner for Dataset to be a Hive Hash Partitioner rather than Murmur3 Partitioner. Regards, Kuchekar, Nilesh

Re: IOT in Spark

2017-05-18 Thread Kuchekar
Hi Gaurav, You might want to look for Lambda Architecture with Spark. https://www.youtube.com/watch?v=xHa7pA94DbA Regards, Kuchekar, Nilesh On Thu, May 18, 2017 at 8:58 PM, Gaurav1809 <gauravhpan...@gmail.com> wrote: > Hello gurus, > > How exactly it work

Spark UI shows Jobs are processing, but the files are already written to S3

2016-11-16 Thread Kuchekar
Hi, I am running a spark job, which saves the computed data (massive data) to S3. On the Spark Ui I see the some jobs are active, but no activity in the logs. Also on S3 all the data has be written (verified each bucket --> it has _SUCCESS file) Am I missing something? Thanks. Kuche

Re: Maintaining order of pair rdd

2016-07-26 Thread Kuchekar
,(y,index)) now reduce by key so and then sort the internal with the index value. Thanks. Kuchekar, Nilesh On Tue, Jul 26, 2016 at 7:35 PM, janardhan shetty <janardhan...@gmail.com> wrote: > Let me provide step wise details: > > 1. > I have an RDD = { > (ID2,18159

Re: Heavy Stage Concentration - Ends With Failure

2016-07-19 Thread Kuchekar
in the Stage tab of the Spark UI. Kuchekar, Nilesh On Tue, Jul 19, 2016 at 8:16 PM, Aaron Jackson <ajack...@pobox.com> wrote: > Hi, > > I have a cluster with 15 nodes of which 5 are HDFS nodes. I kick off a > job that creates some 120 stages. Eventually, the active and pending &g

Re: Memory issues on spark

2016-02-17 Thread Kuchekar
you are setting. Kuchekar, Nilesh On Wed, Feb 17, 2016 at 8:02 PM, <arun.bong...@cognizant.com> wrote: > Hi All, > > I have been facing memory issues in spark. im using spark-sql on AWS EMR. > i have around 50GB file in AWS S3. I want to read this file in BI tool > connecte

Re: Spark execuotr Memory profiling

2016-02-10 Thread Kuchekar
oryOverhead","4000") conf = conf.set("spark.executor.cores","4").set("spark.executor.memory", "15G").set("spark.executor.instances","6") Is it also possible to use reduceBy in place of groupBy that might help the shuffling to