Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
Thanks for the detailed explanation. Is it completely fixed in spark-2.1.0? We are giving very high memory to spark-driver to avoid the OOM(heap space/ GC overhead limit) errors in spark-app. But when we run two-three jobs together, these are bringing down the Hive metastore. We had to

Re: Weird experience Hive with Spark Transformations

2017-01-17 Thread Chetan Khatri
But Hive 1.2.1 do not have hive-site.xml, I tried to add my own which causes me other several issues. On the other side it works well for me with Hive 2.0.1 where hive-site.xml content were as below and copied to spark/conf too. it worked. *5. hive-site.xml configuration setup* Add below at

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Michael Allman
I think I understand. Partition pruning for the case where spark.sql.hive.convertMetastoreParquet is true was not added to Spark until 2.1.0. I think that in previous versions it only worked when spark.sql.hive.convertMetastoreParquet is false. Unfortunately, that configuration gives you data

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
Tested on both 1.5.2 and 1.61. On Wed, Jan 18, 2017 at 12:52 PM, Michael Allman wrote: > What version of Spark are you running? > > On Jan 17, 2017, at 8:42 PM, Raju Bairishetti wrote: > > describe dummy; > > OK > > sample string > > year

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Michael Allman
What version of Spark are you running? > On Jan 17, 2017, at 8:42 PM, Raju Bairishetti wrote: > > describe dummy; > > OK > > sample string > > yearstring > > month

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
describe dummy; OK sample string yearstring month string # Partition Information # col_namedata_type comment yearstring month string val df = sqlContext.sql("select count(1) from rajub.dummy

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Michael Allman
Can you paste the actual query plan here, please? > On Jan 17, 2017, at 7:38 PM, Raju Bairishetti wrote: > > > On Wed, Jan 18, 2017 at 11:13 AM, Michael Allman > wrote: > What is the physical query plan after you set >

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
On Wed, Jan 18, 2017 at 11:13 AM, Michael Allman wrote: > What is the physical query plan after you set > spark.sql.hive.convertMetastoreParquet > to true? > Physical plan continas all the partition locations > > Michael > > On Jan 17, 2017, at 6:51 PM, Raju Bairishetti

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Michael Allman
What is the physical query plan after you set spark.sql.hive.convertMetastoreParquet to true? Michael > On Jan 17, 2017, at 6:51 PM, Raju Bairishetti wrote: > > Thanks Michael for the respopnse. > > > On Wed, Jan 18, 2017 at 2:45 AM, Michael Allman

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
Thanks Michael for the respopnse. On Wed, Jan 18, 2017 at 2:45 AM, Michael Allman wrote: > Hi Raju, > > I'm sorry this isn't working for you. I helped author this functionality > and will try my best to help. > > First, I'm curious why you set

Re: Limit Query Performance Suggestion

2017-01-17 Thread sujith71955
Dear Liang, Thanks for your valuable feedback. There was a mistake in the previous post i corrected it, as you mentioned the `GlobalLimit` we will only take the required number of rows from the input iterator which really pulls data from local blocks and remote blocks. but if the limit value

Feedback on MLlib roadmap process proposal

2017-01-17 Thread Joseph Bradley
Hi all, This is a general call for thoughts about the process for the MLlib roadmap proposed in SPARK-18813. See the section called "Roadmap process." Summary: * This process is about committers indicating intention to shepherd and review. * The goal is to improve visibility and communication.

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Michael Allman
Hi Raju, I'm sorry this isn't working for you. I helped author this functionality and will try my best to help. First, I'm curious why you set spark.sql.hive.convertMetastoreParquet to false? Can you link specifically to the jira issue or spark pr you referred to? The first thing I would try

Re: Both Spark AM and Client are trying to delete Staging Directory

2017-01-17 Thread Rostyslav Sotnychenko
> I think Rostyslav is using a DFS which logs at warn/error if you try to delete a directory that isn't there, so is seeing warning messages that nobody else does Yep, you are correct. > Rostyslav —like I said, i'd be curious as to which DFS/object store you are working with Unfortunately, I am

Re: GraphX-related "open" issues

2017-01-17 Thread Sean Owen
WontFix or Later is fine. There's not really any practical distinction. I figure that if something times out and is closed, it's very unlikely to be looked at again. Therefore marking it as something to do 'later' seemed less accurate. On Tue, Jan 17, 2017 at 5:30 PM Takeshi Yamamuro

Re: GraphX-related "open" issues

2017-01-17 Thread Takeshi Yamamuro
Thank for your comment! I'm just thinking I'll set "Won't Fix" though, "Later" is also okay. But, I re-checked "Contributing to JIRA Maintenance" in the contribution guide (http://spark.apache.org/contributing.html) and I couldn't find any setting policy about "Later". So, IMO it's okay to set

spark main thread quit, but the driver don't crash at standalone cluster

2017-01-17 Thread John Fang
My spark main thread create some daemon threads which maybe timer thread. Then the spark application throw some exceptions, and the main thread will quit. But the jvm of driver don't crash for standalone cluster. Of course the question don't happen at yarn cluster. Because the application

Re: GraphX-related "open" issues

2017-01-17 Thread Dongjoon Hyun
Hi, Takeshi. > So, IMO it seems okay to close tickets about "Improvement" and "New Feature" > for now. I'm just wondering about what kind of field value you want to fill in the `Resolution` field for those issues. Maybe, 'Later'? Or, 'Won't Fix'? Bests, Dongjoon.

spark main thread quit, but the Jvm of driver don't crash

2017-01-17 Thread John Fang
My spark main thread create some daemon thread. Then the spark application throw some exceptions, and the main thread will quit. But the jvm of driver don't crash, so How can i do? for example: val sparkConf = new SparkConf().setAppName("NetworkWordCount")

Re: Weird experience Hive with Spark Transformations

2017-01-17 Thread Dongjoon Hyun
Hi, Chetan. Did you copy your `hive-site.xml` into Spark conf directory? For example, cp /usr/local/hive/conf/hive-site.xml /usr/local/spark/conf If you want to use the existing Hive metastore, you need to provide that information to Spark. Bests, Dongjoon. On 2017-01-16 21:36 (-0800),

GraphX-related "open" issues

2017-01-17 Thread Takeshi Yamamuro
Hi, devs Sorry to bother you, but plz let me check in advance; in JIRA, there are some open (and inactive) issues about GraphX features. IIUC the current GraphX features become almost freeze and they possibly get no modification except for critical bugs. So, IMO it seems okay to close tickets

Re: Both Spark AM and Client are trying to delete Staging Directory

2017-01-17 Thread Steve Loughran
I think Rostyslav is using a DFS which logs at warn/error if you try to delete a directory that isn't there, so is seeing warning messages that nobody else does Rostyslav —like I said, i'd be curious as to which DFS/object store you are working with, as it is behaving slightly differently from

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
Had a high level look into the code. Seems getHiveQlPartitions method from HiveMetastoreCatalog is getting called irrespective of metastorePartitionPruning conf value. It should not fetch all partitions if we set metastorePartitionPruning to true (Default value for this is false) def

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
Hello, Spark sql is generating query plan with all partitions information even though if we apply filters on partitions in the query. Due to this, sparkdriver/hive metastore is hitting with OOM as each table is with lots of partitions. We can confirm from hive audit logs that it tries to