Performance difference between Dataframe and Dataset especially on parquet data.

2019-06-12 Thread Shivam Sharma
on Dataset is causing OOM issues with same execution parameters. Thanks -- Shivam Sharma Indian Institute Of Information Technology, Design and Manufacturing Jabalpur Email:- 28shivamsha...@gmail.com LinkedIn:-*https://www.linkedin.com/in/28shivamsharma <https://www.linkedin.com/in/28shivamsharma>*

How does number of partitions in DataFrame get decided while reading from HIVE

2019-05-22 Thread Shivam Sharma
Hi all, I just need to know how spark decide how many partitions should be created while reading a table from hive. Thanks -- Shivam Sharma Indian Institute Of Information Technology, Design and Manufacturing Jabalpur Email:- 28shivamsha...@gmail.com LinkedIn:-*https://www.linkedin.com

Out Of Memory while reading a table partition from HIVE

2019-05-17 Thread Shivam Sharma
size of files in Table partition(*date='2019-05-14'*), max file size is *1.1 GB* and I have given *7GB* to each executor so if I am right above then it should not throw OOM. 3. And when I have put the* LIMIT 10* then does spark-hive reads all files? Thanks -- Shivam Sharma Indian Institute O

GC overhead while read a table partition from HIVE

2019-05-16 Thread Shivam Sharma
ble partition(*date='2019-05-14'*), max file size is *1.1 GB* and I have given *7GB* to each executor so if I am right above then it should not throw OOM. 3. And when I have put the* LIMIT 10* then does spark-hive reads all files? Thanks -- Shivam Sharma Indian Institute Of Information Techn

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Shivam Sharma
ck size" > > Arnaud > > On Mon, Jan 21, 2019 at 9:01 AM Shivam Sharma <28shivamsha...@gmail.com> > wrote: > >> Don't we have any property for it? >> >> One more quick question that if files created by Spark is less than HDFS >> block size then the r

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Shivam Sharma
Don't we have any property for it? One more quick question that if files created by Spark is less than HDFS block size then the rest of Block space will become unavailable and remain unutilized or it will be shared with other files? On Mon, Jan 21, 2019 at 1:30 PM Shivam Sharma <28shivam

Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Shivam Sharma
spark to persist according to HDFS Blocks. We have something like this HIVE which solves this problem: set hive.merge.sparkfiles=true; set hive.merge.smallfiles.avgsize=204800; set hive.merge.size.per.task=409600; Thanks -- Shivam Sharma Indian Institute Of Information Technology

Having issues when running spark with s3

2018-05-12 Thread Shivam Sharma
heduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Add hive-site.xml at runtime

2017-02-12 Thread Shivam Sharma
. Thanks -- Shivam Sharma

Add hive-site.xml at runtime

2017-02-10 Thread Shivam Sharma
. Thanks -- Shivam Sharma

Re: Add hive-site.xml at runtime

2017-02-10 Thread Shivam Sharma
Did anybody get above mail? Thanks On Fri, Feb 10, 2017 at 11:51 AM, Shivam Sharma <28shivamsha...@gmail.com> wrote: > Hi, > > I have multiple hive configurations(hive-site.xml) and because of that > only I am not able to add any hive configuration in spark *conf* director

Add hive-site.xml at runtime

2017-02-09 Thread Shivam Sharma
. Thanks -- Shivam Sharma