date:20170117

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti

Thanks for the detailed explanation. Is it completely fixed in spark-2.1.0? We are giving very high memory to spark-driver to avoid the OOM(heap space/ GC overhead limit) errors in spark-app. But when we run two-three jobs together, these are bringing down the Hive metastore. We had to

Re: Weird experience Hive with Spark Transformations

2017-01-17 Thread Chetan Khatri

But Hive 1.2.1 do not have hive-site.xml, I tried to add my own which causes me other several issues. On the other side it works well for me with Hive 2.0.1 where hive-site.xml content were as below and copied to spark/conf too. it worked. *5. hive-site.xml configuration setup* Add below at

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Michael Allman

I think I understand. Partition pruning for the case where spark.sql.hive.convertMetastoreParquet is true was not added to Spark until 2.1.0. I think that in previous versions it only worked when spark.sql.hive.convertMetastoreParquet is false. Unfortunately, that configuration gives you data

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti

Tested on both 1.5.2 and 1.61. On Wed, Jan 18, 2017 at 12:52 PM, Michael Allman wrote: > What version of Spark are you running? > > On Jan 17, 2017, at 8:42 PM, Raju Bairishetti wrote: > > describe dummy; > > OK > > sample string > > year

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Michael Allman

What version of Spark are you running? > On Jan 17, 2017, at 8:42 PM, Raju Bairishetti wrote: > > describe dummy; > > OK > > sample string > > yearstring > > month

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti

describe dummy; OK sample string yearstring month string # Partition Information # col_namedata_type comment yearstring month string val df = sqlContext.sql("select count(1) from rajub.dummy

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Michael Allman

Can you paste the actual query plan here, please? > On Jan 17, 2017, at 7:38 PM, Raju Bairishetti wrote: > > > On Wed, Jan 18, 2017 at 11:13 AM, Michael Allman > wrote: > What is the physical query plan after you set >

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti

On Wed, Jan 18, 2017 at 11:13 AM, Michael Allman wrote: > What is the physical query plan after you set > spark.sql.hive.convertMetastoreParquet > to true? > Physical plan continas all the partition locations > > Michael > > On Jan 17, 2017, at 6:51 PM, Raju Bairishetti

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Michael Allman

What is the physical query plan after you set spark.sql.hive.convertMetastoreParquet to true? Michael > On Jan 17, 2017, at 6:51 PM, Raju Bairishetti wrote: > > Thanks Michael for the respopnse. > > > On Wed, Jan 18, 2017 at 2:45 AM, Michael Allman

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti

Thanks Michael for the respopnse. On Wed, Jan 18, 2017 at 2:45 AM, Michael Allman wrote: > Hi Raju, > > I'm sorry this isn't working for you. I helped author this functionality > and will try my best to help. > > First, I'm curious why you set

Re: Limit Query Performance Suggestion

2017-01-17 Thread sujith71955

Dear Liang, Thanks for your valuable feedback. There was a mistake in the previous post i corrected it, as you mentioned the `GlobalLimit` we will only take the required number of rows from the input iterator which really pulls data from local blocks and remote blocks. but if the limit value

Feedback on MLlib roadmap process proposal

2017-01-17 Thread Joseph Bradley

Hi all, This is a general call for thoughts about the process for the MLlib roadmap proposed in SPARK-18813. See the section called "Roadmap process." Summary: * This process is about committers indicating intention to shepherd and review. * The goal is to improve visibility and communication.

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Michael Allman

Hi Raju, I'm sorry this isn't working for you. I helped author this functionality and will try my best to help. First, I'm curious why you set spark.sql.hive.convertMetastoreParquet to false? Can you link specifically to the jira issue or spark pr you referred to? The first thing I would try

Re: Both Spark AM and Client are trying to delete Staging Directory

2017-01-17 Thread Rostyslav Sotnychenko

> I think Rostyslav is using a DFS which logs at warn/error if you try to delete a directory that isn't there, so is seeing warning messages that nobody else does Yep, you are correct. > Rostyslav —like I said, i'd be curious as to which DFS/object store you are working with Unfortunately, I am

Re: GraphX-related "open" issues

2017-01-17 Thread Sean Owen

WontFix or Later is fine. There's not really any practical distinction. I figure that if something times out and is closed, it's very unlikely to be looked at again. Therefore marking it as something to do 'later' seemed less accurate. On Tue, Jan 17, 2017 at 5:30 PM Takeshi Yamamuro

Re: GraphX-related "open" issues

2017-01-17 Thread Takeshi Yamamuro

Thank for your comment! I'm just thinking I'll set "Won't Fix" though, "Later" is also okay. But, I re-checked "Contributing to JIRA Maintenance" in the contribution guide (http://spark.apache.org/contributing.html) and I couldn't find any setting policy about "Later". So, IMO it's okay to set

spark main thread quit, but the driver don't crash at standalone cluster

2017-01-17 Thread John Fang

My spark main thread create some daemon threads which maybe timer thread. Then the spark application throw some exceptions, and the main thread will quit. But the jvm of driver don't crash for standalone cluster. Of course the question don't happen at yarn cluster. Because the application

Re: GraphX-related "open" issues

2017-01-17 Thread Dongjoon Hyun

Hi, Takeshi. > So, IMO it seems okay to close tickets about "Improvement" and "New Feature" > for now. I'm just wondering about what kind of field value you want to fill in the `Resolution` field for those issues. Maybe, 'Later'? Or, 'Won't Fix'? Bests, Dongjoon.

spark main thread quit, but the Jvm of driver don't crash

2017-01-17 Thread John Fang

My spark main thread create some daemon thread. Then the spark application throw some exceptions, and the main thread will quit. But the jvm of driver don't crash, so How can i do? for example: val sparkConf = new SparkConf().setAppName("NetworkWordCount")

Re: Weird experience Hive with Spark Transformations

2017-01-17 Thread Dongjoon Hyun

Hi, Chetan. Did you copy your `hive-site.xml` into Spark conf directory? For example, cp /usr/local/hive/conf/hive-site.xml /usr/local/spark/conf If you want to use the existing Hive metastore, you need to provide that information to Spark. Bests, Dongjoon. On 2017-01-16 21:36 (-0800),

GraphX-related "open" issues

2017-01-17 Thread Takeshi Yamamuro

Hi, devs Sorry to bother you, but plz let me check in advance; in JIRA, there are some open (and inactive) issues about GraphX features. IIUC the current GraphX features become almost freeze and they possibly get no modification except for critical bugs. So, IMO it seems okay to close tickets

Re: Both Spark AM and Client are trying to delete Staging Directory

2017-01-17 Thread Steve Loughran

I think Rostyslav is using a DFS which logs at warn/error if you try to delete a directory that isn't there, so is seeing warning messages that nobody else does Rostyslav —like I said, i'd be curious as to which DFS/object store you are working with, as it is behaving slightly differently from

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti

Had a high level look into the code. Seems getHiveQlPartitions method from HiveMetastoreCatalog is getting called irrespective of metastorePartitionPruning conf value. It should not fetch all partitions if we set metastorePartitionPruning to true (Default value for this is false) def

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti

Hello, Spark sql is generating query plan with all partitions information even though if we apply filters on partitions in the query. Due to this, sparkdriver/hive metastore is hitting with OOM as each table is with lots of partitions. We can confirm from hive audit logs that it tries to

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Weird experience Hive with Spark Transformations

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Limit Query Performance Suggestion

Feedback on MLlib roadmap process proposal

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Both Spark AM and Client are trying to delete Staging Directory

Re: GraphX-related "open" issues

Re: GraphX-related "open" issues

spark main thread quit, but the driver don't crash at standalone cluster

Re: GraphX-related "open" issues

spark main thread quit, but the Jvm of driver don't crash

Re: Weird experience Hive with Spark Transformations

GraphX-related "open" issues

Re: Both Spark AM and Client are trying to delete Staging Directory

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

24 matches

Site Navigation

Mail list logo

Footer information