RE: nested collection object query

2015-09-29 Thread Tridib Samanta
Well I figure out a way to use explode. But it returns two rows if there is two 
match in nested array objects.
 
select id from department LATERAL VIEW explode(employee) dummy_table as emp 
where emp.name = 'employee0'
 
I was looking for an operator that loops through the array and return true if 
it matches the condition and returns the parent object.
From: tridib.sama...@live.com
To: java8...@hotmail.com; user@spark.apache.org
Subject: RE: nested collection object query
Date: Mon, 28 Sep 2015 22:26:46 -0700




Thanks for you response Yong! Array syntax works fine. But I am not sure how to 
use explode. Should I use as follows?
select id from department where explode(employee).name = 'employee0
 
This query gives me java.lang.UnsupportedOperationException . I am using 
HiveContext.
 
From: java8...@hotmail.com
To: tridib.sama...@live.com; user@spark.apache.org
Subject: RE: nested collection object query
Date: Mon, 28 Sep 2015 20:42:11 -0400




Your employee in fact is an array of struct, not just struct.
If you are using HiveSQLContext, then you can refer it like following:
select id from member where employee[0].name = 'employee0'
The employee[0] is pointing to the 1st element of the array. 
If you want to query all the elements in the array, then you have to use 
"explode" in the Hive. 
See Hive document for this:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode
Yong

> Date: Mon, 28 Sep 2015 16:37:23 -0700
> From: tridib.sama...@live.com
> To: user@spark.apache.org
> Subject: nested collection object query
> 
> Hi Friends,
> What is the right syntax to query on collection of nested object? I have a
> following schema and SQL. But it does not return anything. Is the syntax
> correct?
> 
> root
>  |-- id: string (nullable = false)
>  |-- employee: array (nullable = false)
>  ||-- element: struct (containsNull = true)
>  |||-- id: string (nullable = false)
>  |||-- name: string (nullable = false)
>  |||-- speciality: string (nullable = false)
> 
> 
> select id from member where employee.name = 'employee0'
> 
> Uploaded a test if some one want to try it out. NestedObjectTest.java
> 
>   
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/nested-collection-object-query-tp24853.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

  

RE: nested collection object query

2015-09-28 Thread Tridib Samanta
Thanks for you response Yong! Array syntax works fine. But I am not sure how to 
use explode. Should I use as follows?
select id from department where explode(employee).name = 'employee0
 
This query gives me java.lang.UnsupportedOperationException . I am using 
HiveContext.
 
From: java8...@hotmail.com
To: tridib.sama...@live.com; user@spark.apache.org
Subject: RE: nested collection object query
Date: Mon, 28 Sep 2015 20:42:11 -0400




Your employee in fact is an array of struct, not just struct.
If you are using HiveSQLContext, then you can refer it like following:
select id from member where employee[0].name = 'employee0'
The employee[0] is pointing to the 1st element of the array. 
If you want to query all the elements in the array, then you have to use 
"explode" in the Hive. 
See Hive document for this:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode
Yong

> Date: Mon, 28 Sep 2015 16:37:23 -0700
> From: tridib.sama...@live.com
> To: user@spark.apache.org
> Subject: nested collection object query
> 
> Hi Friends,
> What is the right syntax to query on collection of nested object? I have a
> following schema and SQL. But it does not return anything. Is the syntax
> correct?
> 
> root
>  |-- id: string (nullable = false)
>  |-- employee: array (nullable = false)
>  ||-- element: struct (containsNull = true)
>  |||-- id: string (nullable = false)
>  |||-- name: string (nullable = false)
>  |||-- speciality: string (nullable = false)
> 
> 
> select id from member where employee.name = 'employee0'
> 
> Uploaded a test if some one want to try it out. NestedObjectTest.java
> 
>   
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/nested-collection-object-query-tp24853.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

  

RE: Official Docker container for Spark

2015-05-29 Thread Tridib Samanta
Thanks all for your reply. I was evaluating which one fits best for me. I 
picked epahomov/docker-spark from docker registry and suffice my need.
 
Thanks
Tridib
 
Date: Fri, 22 May 2015 14:15:42 +0530
Subject: Re: Official Docker container for Spark
From: riteshoneinamill...@gmail.com
To: 917361...@qq.com
CC: tridib.sama...@live.com; user@spark.apache.org

Use this:
sequenceiq/docker

Here's a link to their github repo:
docker-spark


They have repos for other big data tools too which are agin really nice. Its 
being maintained properly by their devs and 
  

RE: HBase HTable constructor hangs

2015-04-30 Thread Tridib Samanta
You are right. After I moved from HBase 0.98.1 to 1.0.0 this problem got 
solved. Thanks all!
 
Date: Wed, 29 Apr 2015 06:58:59 -0700
Subject: Re: HBase HTable constructor hangs
From: yuzhih...@gmail.com
To: tridib.sama...@live.com
CC: d...@ocirs.com; user@spark.apache.org

Can you verify whether the hbase release you're using has the following fix ?
HBASE-8 non environment variable solution for IllegalAccessError

Cheers
On Tue, Apr 28, 2015 at 10:47 PM, Tridib Samanta tridib.sama...@live.com 
wrote:



I turned on the TRACE and I see lot of following exception:
 
java.lang.IllegalAccessError: com/google/protobuf/ZeroCopyLiteralByteString
 at 
org.apache.hadoop.hbase.protobuf.RequestConverter.buildRegionSpecifier(RequestConverter.java:897)
 at 
org.apache.hadoop.hbase.protobuf.RequestConverter.buildGetRowOrBeforeRequest(RequestConverter.java:131)
 at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1402)
 at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:701)
 at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:699)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
 at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:705)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1102)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1162)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:326)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192)
 
Thanks
Tridib
 
From: d...@ocirs.com
Date: Tue, 28 Apr 2015 22:24:39 -0700
Subject: Re: HBase HTable constructor hangs
To: tridib.sama...@live.com

In that case, something else is failing and the reason HBase looks like it 
hangs is that the hbase timeout or retry count is too high.
Try setting the following conf and hbase will only hang for a few mins max and 
return a helpful error message.
hbaseConf.set(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2)



--
Dean Chen


On Tue, Apr 28, 2015 at 10:18 PM, Tridib Samanta tridib.sama...@live.com 
wrote:



Nope, my hbase is unsecured.
 
From: d...@ocirs.com
Date: Tue, 28 Apr 2015 22:09:51 -0700
Subject: Re: HBase HTable constructor hangs
To: tridib.sama...@live.com

Hi Tridib,
Are you running this on a secure Hadoop/HBase cluster? I ran in to a similar 
issue where the HBase client can successfully connect in local mode and in the 
yarn-client driver but not on remote executors. The problem is that Spark 
doesn't distribute the hbase auth key, see the following Jira ticket and PR.
https://issues.apache.org/jira/browse/SPARK-6918

--
Dean Chen


On Tue, Apr 28, 2015 at 9:34 PM, Tridib Samanta tridib.sama...@live.com wrote:



I am 100% sure how it's picking up the configuration. I copied the 
hbase-site.xml in hdfs/spark cluster (single machine). I also included 
hbase-site.xml in spark-job jar files. spark-job jar file also have yarn-site 
and mapred-site and core-site.xml in it.
 
One interesting thing is, when I run the spark-job jar as standalone and 
execute the HBase client from a main method, it works fine. Same client unable 
to connect/hangs when the jar is distributed in spark.
 
Thanks
Tridib
 
Date: Tue, 28 Apr 2015 21:25:41 -0700
Subject: Re: HBase HTable constructor hangs
From: yuzhih...@gmail.com
To: tridib.sama...@live.com
CC: user@spark.apache.org

How did you distribute hbase-site.xml to the nodes ?
Looks like HConnectionManager couldn't find the hbase:meta server.
Cheers
On Tue, Apr 28, 2015 at 9:19 PM, Tridib Samanta tridib.sama...@live.com wrote:



I am using Spark 1.2.0 and HBase 0.98.1-cdh5.1.0.
 
Here is the jstack trace. Complete stack trace attached.
 
Executor task launch worker-1 #58 daemon prio=5 os_prio=0 
tid=0x7fd3d0445000 nid=0x488 waiting on condition [0x7fd4507d9000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:152)
 - locked 0xf8cb7258 (a 
org.apache.hadoop.hbase.client.RpcRetryingCaller)
 at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:705)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1102)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1162)
 - locked 0xf84ac0b0

RE: HBase HTable constructor hangs

2015-04-29 Thread Tridib Samanta
I turned on the TRACE and I see lot of following exception:
 
java.lang.IllegalAccessError: com/google/protobuf/ZeroCopyLiteralByteString
 at 
org.apache.hadoop.hbase.protobuf.RequestConverter.buildRegionSpecifier(RequestConverter.java:897)
 at 
org.apache.hadoop.hbase.protobuf.RequestConverter.buildGetRowOrBeforeRequest(RequestConverter.java:131)
 at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1402)
 at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:701)
 at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:699)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
 at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:705)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1102)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1162)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:326)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192)
 
Thanks
Tridib
 
From: d...@ocirs.com
Date: Tue, 28 Apr 2015 22:24:39 -0700
Subject: Re: HBase HTable constructor hangs
To: tridib.sama...@live.com

In that case, something else is failing and the reason HBase looks like it 
hangs is that the hbase timeout or retry count is too high.
Try setting the following conf and hbase will only hang for a few mins max and 
return a helpful error message.
hbaseConf.set(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 2)



--
Dean Chen


On Tue, Apr 28, 2015 at 10:18 PM, Tridib Samanta tridib.sama...@live.com 
wrote:



Nope, my hbase is unsecured.
 
From: d...@ocirs.com
Date: Tue, 28 Apr 2015 22:09:51 -0700
Subject: Re: HBase HTable constructor hangs
To: tridib.sama...@live.com

Hi Tridib,
Are you running this on a secure Hadoop/HBase cluster? I ran in to a similar 
issue where the HBase client can successfully connect in local mode and in the 
yarn-client driver but not on remote executors. The problem is that Spark 
doesn't distribute the hbase auth key, see the following Jira ticket and PR.
https://issues.apache.org/jira/browse/SPARK-6918

--
Dean Chen


On Tue, Apr 28, 2015 at 9:34 PM, Tridib Samanta tridib.sama...@live.com wrote:



I am 100% sure how it's picking up the configuration. I copied the 
hbase-site.xml in hdfs/spark cluster (single machine). I also included 
hbase-site.xml in spark-job jar files. spark-job jar file also have yarn-site 
and mapred-site and core-site.xml in it.
 
One interesting thing is, when I run the spark-job jar as standalone and 
execute the HBase client from a main method, it works fine. Same client unable 
to connect/hangs when the jar is distributed in spark.
 
Thanks
Tridib
 
Date: Tue, 28 Apr 2015 21:25:41 -0700
Subject: Re: HBase HTable constructor hangs
From: yuzhih...@gmail.com
To: tridib.sama...@live.com
CC: user@spark.apache.org

How did you distribute hbase-site.xml to the nodes ?
Looks like HConnectionManager couldn't find the hbase:meta server.
Cheers
On Tue, Apr 28, 2015 at 9:19 PM, Tridib Samanta tridib.sama...@live.com wrote:



I am using Spark 1.2.0 and HBase 0.98.1-cdh5.1.0.
 
Here is the jstack trace. Complete stack trace attached.
 
Executor task launch worker-1 #58 daemon prio=5 os_prio=0 
tid=0x7fd3d0445000 nid=0x488 waiting on condition [0x7fd4507d9000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:152)
 - locked 0xf8cb7258 (a 
org.apache.hadoop.hbase.client.RpcRetryingCaller)
 at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:705)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1102)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1162)
 - locked 0xf84ac0b0 (a java.lang.Object)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:326)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:150)
 at com.mypackage.storeTuples(CubeStoreService.java

RE: HBase HTable constructor hangs

2015-04-28 Thread Tridib Samanta
I am using Spark 1.2.0 and HBase 0.98.1-cdh5.1.0.
 
Here is the jstack trace. Complete stack trace attached.
 
Executor task launch worker-1 #58 daemon prio=5 os_prio=0 
tid=0x7fd3d0445000 nid=0x488 waiting on condition [0x7fd4507d9000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:152)
 - locked 0xf8cb7258 (a 
org.apache.hadoop.hbase.client.RpcRetryingCaller)
 at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:705)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1102)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1162)
 - locked 0xf84ac0b0 (a java.lang.Object)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:326)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:150)
 at com.mypackage.storeTuples(CubeStoreService.java:59)
 at 
com.mypackage.StorePartitionToHBaseStoreFunction.call(StorePartitionToHBaseStoreFunction.java:23)
 at 
com.mypackage.StorePartitionToHBaseStoreFunction.call(StorePartitionToHBaseStoreFunction.java:13)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
 at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:773)
 at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:773)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Executor task launch worker-0 #57 daemon prio=5 os_prio=0 
tid=0x7fd3d0443800 nid=0x487 waiting for monitor entry [0x7fd4506d8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1156)
 - waiting to lock 0xf84ac0b0 (a java.lang.Object)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:326)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:150)
 at com.mypackage.storeTuples(CubeStoreService.java:59)
 at 
com.mypackage.StorePartitionToHBaseStoreFunction.call(StorePartitionToHBaseStoreFunction.java:23)
 at 
com.mypackage.StorePartitionToHBaseStoreFunction.call(StorePartitionToHBaseStoreFunction.java:13)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
 at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:773)
 at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:773)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 
Date: Tue, 28 Apr 2015 19:35:26 -0700
Subject: Re: HBase HTable constructor hangs
From: yuzhih...@gmail.com
To: tridib.sama...@live.com
CC: user@spark.apache.org

Can you give us more information ?Such as hbase release, Spark release.
If you can pastebin jstack of the hanging HTable process, that would help.
BTW I used 

RE: HBase HTable constructor hangs

2015-04-28 Thread Tridib Samanta
I am 100% sure how it's picking up the configuration. I copied the 
hbase-site.xml in hdfs/spark cluster (single machine). I also included 
hbase-site.xml in spark-job jar files. spark-job jar file also have yarn-site 
and mapred-site and core-site.xml in it.
 
One interesting thing is, when I run the spark-job jar as standalone and 
execute the HBase client from a main method, it works fine. Same client unable 
to connect/hangs when the jar is distributed in spark.
 
Thanks
Tridib
 
Date: Tue, 28 Apr 2015 21:25:41 -0700
Subject: Re: HBase HTable constructor hangs
From: yuzhih...@gmail.com
To: tridib.sama...@live.com
CC: user@spark.apache.org

How did you distribute hbase-site.xml to the nodes ?
Looks like HConnectionManager couldn't find the hbase:meta server.
Cheers
On Tue, Apr 28, 2015 at 9:19 PM, Tridib Samanta tridib.sama...@live.com wrote:



I am using Spark 1.2.0 and HBase 0.98.1-cdh5.1.0.
 
Here is the jstack trace. Complete stack trace attached.
 
Executor task launch worker-1 #58 daemon prio=5 os_prio=0 
tid=0x7fd3d0445000 nid=0x488 waiting on condition [0x7fd4507d9000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:152)
 - locked 0xf8cb7258 (a 
org.apache.hadoop.hbase.client.RpcRetryingCaller)
 at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:705)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1102)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1162)
 - locked 0xf84ac0b0 (a java.lang.Object)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:326)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:150)
 at com.mypackage.storeTuples(CubeStoreService.java:59)
 at 
com.mypackage.StorePartitionToHBaseStoreFunction.call(StorePartitionToHBaseStoreFunction.java:23)
 at 
com.mypackage.StorePartitionToHBaseStoreFunction.call(StorePartitionToHBaseStoreFunction.java:13)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
 at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:773)
 at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:773)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Executor task launch worker-0 #57 daemon prio=5 os_prio=0 
tid=0x7fd3d0443800 nid=0x487 waiting for monitor entry [0x7fd4506d8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1156)
 - waiting to lock 0xf84ac0b0 (a java.lang.Object)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:326)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:150)
 at com.mypackage.storeTuples(CubeStoreService.java:59)
 at 
com.mypackage.StorePartitionToHBaseStoreFunction.call(StorePartitionToHBaseStoreFunction.java:23)
 at 
com.mypackage.StorePartitionToHBaseStoreFunction.call(StorePartitionToHBaseStoreFunction.java:13)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:195)
 at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:773)
 at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:773)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply

RE: group by order by fails

2015-02-27 Thread Tridib Samanta
Thanks Michael! It worked. Some how my mails are not getting accepted by spark 
user mailing list. :(
 
From: mich...@databricks.com
Date: Thu, 26 Feb 2015 17:49:43 -0800
Subject: Re: group by order by fails
To: tridib.sama...@live.com
CC: ak...@sigmoidanalytics.com; user@spark.apache.org

Assign an alias to the count in the select clause and use that alias in the 
order by clause.
On Wed, Feb 25, 2015 at 11:17 PM, Tridib Samanta tridib.sama...@live.com 
wrote:



Actually I just realized , I am using 1.2.0.
 
Thanks
Tridib
 
Date: Thu, 26 Feb 2015 12:37:06 +0530
Subject: Re: group by order by fails
From: ak...@sigmoidanalytics.com
To: tridib.sama...@live.com
CC: user@spark.apache.org

Which version of spark are you having? It seems there was a similar Jira 
https://issues.apache.org/jira/browse/SPARK-2474ThanksBest Regards

On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote:
Hi,

I need to find top 10 most selling samples. So query looks like:

select  s.name, count(s.name) from sample s group by s.name order by

count(s.name)



This query fails with following error:

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree:

Sort [COUNT(name#0) ASC], true

 Exchange (RangePartitioning [COUNT(name#0) ASC], 200)

  Aggregate false, [name#0], [name#0 AS

name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]

   Exchange (HashPartitioning [name#0], 200)

Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]

 PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at

JavaSQLContext.scala:102



at

org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)

at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206)

at 
org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43)

at

org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)

at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)

at

org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)

at

com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at

org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)

at

org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

at

org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)

at

org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)

at

org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)

at

org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)

at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)

at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)

at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)

at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)

at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)

at org.junit.runners.ParentRunner.run(ParentRunner.java:309)

at org.junit.runner.JUnitCore.run(JUnitCore.java:160)

at

com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)

at

com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)

at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)

Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:

execute, tree:

Exchange (RangePartitioning [COUNT(name#0) ASC], 200)

 Aggregate false, [name#0], [name#0 AS

name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]

  Exchange (HashPartitioning [name#0], 200)

   Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]

PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at

JavaSQLContext.scala

RE: group by order by fails

2015-02-25 Thread Tridib Samanta
Actually I just realized , I am using 1.2.0.
 
Thanks
Tridib
 
Date: Thu, 26 Feb 2015 12:37:06 +0530
Subject: Re: group by order by fails
From: ak...@sigmoidanalytics.com
To: tridib.sama...@live.com
CC: user@spark.apache.org

Which version of spark are you having? It seems there was a similar Jira 
https://issues.apache.org/jira/browse/SPARK-2474ThanksBest Regards

On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote:
Hi,

I need to find top 10 most selling samples. So query looks like:

select  s.name, count(s.name) from sample s group by s.name order by

count(s.name)



This query fails with following error:

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree:

Sort [COUNT(name#0) ASC], true

 Exchange (RangePartitioning [COUNT(name#0) ASC], 200)

  Aggregate false, [name#0], [name#0 AS

name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]

   Exchange (HashPartitioning [name#0], 200)

Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]

 PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at

JavaSQLContext.scala:102



at

org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)

at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206)

at 
org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43)

at

org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)

at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)

at

org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)

at

com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at

org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)

at

org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

at

org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)

at

org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)

at

org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)

at

org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)

at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)

at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)

at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)

at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)

at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)

at org.junit.runners.ParentRunner.run(ParentRunner.java:309)

at org.junit.runner.JUnitCore.run(JUnitCore.java:160)

at

com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)

at

com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)

at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)

Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:

execute, tree:

Exchange (RangePartitioning [COUNT(name#0) ASC], 200)

 Aggregate false, [name#0], [name#0 AS

name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]

  Exchange (HashPartitioning [name#0], 200)

   Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]

PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at

JavaSQLContext.scala:102



at

org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)

at org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:47)

at

org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207)

at

org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207)

at

org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46)


sql - group by on UDF not working

2014-11-07 Thread Tridib Samanta
I am trying to group by on a calculated field. Is it supported on spark sql? I 
am running it on a nested json structure.
 
Query: SELECT YEAR(c.Patient.DOB), sum(c.ClaimPay.TotalPayAmnt) FROM claim c 
group by YEAR(c.Patient.DOB)
 
Spark Version: spark-1.2.0-SNAPSHOT wit Hive and hadoop 2.4.
Error: 
 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression not 
in GROUP BY: HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFYear(Patient#8.DOB 
AS DOB#191) AS c_0#185, tree:
Aggregate [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFYear(Patient#8.DOB)], 
[HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFYear(Patient#8.DOB AS DOB#191) 
AS c_0#185,SUM(CAST(ClaimPay#5.TotalPayAmnt AS TotalPayAmnt#192, LongType)) AS 
c_1#186L]
 Subquery c
  Subquery claim
   LogicalRDD 
[AttendPhysician#0,BillProv#1,Claim#2,ClaimClinic#3,ClaimInfo#4,ClaimPay#5,ClaimTL#6,OpPhysician#7,Patient#8,PayToPhysician#9,Payer#10,Physician#11,RefProv#12,Services#13,Subscriber#14],
 MappedRDD[5] at map at JsonRDD.scala:43
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckAggregation$$anonfun$apply$3$$anonfun$applyOrElse$6.apply(Analyzer.scala:127)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckAggregation$$anonfun$apply$3$$anonfun$applyOrElse$6.apply(Analyzer.scala:125)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckAggregation$$anonfun$apply$3.applyOrElse(Analyzer.scala:125)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckAggregation$$anonfun$apply$3.applyOrElse(Analyzer.scala:115)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckAggregation$.apply(Analyzer.scala:115)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckAggregation$.apply(Analyzer.scala:113)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)
at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)
at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)
at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:423)
at $iwC$$iwC$$iwC$$iwC.init(console:17)
at $iwC$$iwC$$iwC.init(console:22)
at $iwC$$iwC.init(console:24)
at $iwC.init(console:26)
at init(console:28)
at .init(console:32)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
at 

RE: spark sql: join sql fails after sqlCtx.cacheTable()

2014-11-06 Thread Tridib Samanta
I am getting exception at sparksheel at the following line:
 
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
error: bad symbolic reference. A signature in HiveContext.class refers to term 
hive
in package org.apache.hadoop which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling 
HiveContext.class.
error:
 while compiling: console
during phase: erasure
 library version: version 2.10.4
compiler version: version 2.10.4
  reconstructed args:
  last tree to typer: Apply(value $outer)
  symbol: value $outer (flags: method synthetic stable 
expandedname triedcooking)
   symbol definition: val $outer(): $iwC.$iwC.type
 tpe: $iwC.$iwC.type
   symbol owners: value $outer - class $iwC - class $iwC - class $iwC - 
class $read - package $line5
  context owners: class $iwC - class $iwC - class $iwC - class $iwC - 
class $read - package $line5
== Enclosing template or block ==
ClassDef( // class $iwC extends Serializable
  0
  $iwC
  []
  Template( // val local $iwC: notype, tree.tpe=$iwC
java.lang.Object, scala.Serializable // parents
ValDef(
  private
  _
  tpt
  empty
)
// 5 statements
DefDef( // def init(arg$outer: $iwC.$iwC.$iwC.type): $iwC
  method triedcooking
  init
  []
  // 1 parameter list
  ValDef( // $outer: $iwC.$iwC.$iwC.type
param
$outer
tpt // tree.tpe=$iwC.$iwC.$iwC.type
empty
  )
  tpt // tree.tpe=$iwC
  Block( // tree.tpe=Unit
Apply( // def init(): Object in class Object, tree.tpe=Object
  $iwC.super.init // def init(): Object in class Object, 
tree.tpe=()Object
  Nil
)
()
  )
)
ValDef( // private[this] val sqlContext: 
org.apache.spark.sql.hive.HiveContext
  private local triedcooking
  sqlContext 
  tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext
  Apply( // def init(sc: org.apache.spark.SparkContext): 
org.apache.spark.sql.hive.HiveContext in class HiveContext, 
tree.tpe=org.apache.spark.sql.hive.HiveContext
new org.apache.spark.sql.hive.HiveContext.init // def init(sc: 
org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class 
HiveContext, tree.tpe=(sc: 
org.apache.spark.SparkContext)org.apache.spark.sql.hive.HiveContext
Apply( // val sc(): org.apache.spark.SparkContext, 
tree.tpe=org.apache.spark.SparkContext
  
$iwC.this.$line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$$outer().$VAL1().$iw().$iw().sc
 // val sc(): org.apache.spark.SparkContext, 
tree.tpe=()org.apache.spark.SparkContext
  Nil
)
  )
)
DefDef( // val sqlContext(): org.apache.spark.sql.hive.HiveContext
  method stable accessor
  sqlContext
  []
  List(Nil)
  tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext
  $iwC.this.sqlContext  // private[this] val sqlContext: 
org.apache.spark.sql.hive.HiveContext, 
tree.tpe=org.apache.spark.sql.hive.HiveContext
)
ValDef( // protected val $outer: $iwC.$iwC.$iwC.type
  protected synthetic paramaccessor triedcooking
  $outer 
  tpt // tree.tpe=$iwC.$iwC.$iwC.type
  empty
)
DefDef( // val $outer(): $iwC.$iwC.$iwC.type
  method synthetic stable expandedname triedcooking
  $line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer
  []
  List(Nil)
  tpt // tree.tpe=Any
  $iwC.this.$outer  // protected val $outer: $iwC.$iwC.$iwC.type, 
tree.tpe=$iwC.$iwC.$iwC.type
)
  )
)
== Expanded type of tree ==
ThisType(class $iwC)
uncaught exception during compilation: scala.reflect.internal.Types$TypeError
scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in 
HiveContext.class refers to term conf
in value org.apache.hadoop.hive which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling 
HiveContext.class.
That entry seems to have slain the compiler.  Shall I replay
your session? I can re-run each line except the last one.

 
Thanks
Tridib
 
Date: Tue, 21 Oct 2014 09:39:49 -0700
Subject: Re: spark sql: join sql fails after sqlCtx.cacheTable()
From: ri...@infoobjects.com
To: tridib.sama...@live.com
CC: u...@spark.incubator.apache.org

Hi Tridib,
I changed SQLContext to HiveContext and it started working. These are steps I 
used.







val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)val person = 
sqlContext.jsonFile(json/person.json)person.printSchema()person.registerTempTable(person)val
 address = 
sqlContext.jsonFile(json/address.json)address.printSchema()address.registerTempTable(address)sqlContext.cacheTable(person)sqlContext.cacheTable(address)val
 rs2 = sqlContext.sql(select p.id,p.name,a.city from person 

RE: Unable to use HiveContext in spark-shell

2014-11-06 Thread Tridib Samanta



I am using spark 1.1.0.
I built it using:
./make-distribution.sh -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive 
-DskipTests
 
My ultimate goal is to execute a query on parquet file with nested structure 
and cast a date string to Date. This is required to calculate the age of Person 
entity. but I am even unable to pass this line:val sqlContext = new 
org.apache.spark.sql.hive.HiveContext(sc) 
I made sure that org.apache.hadoop package is in the spark assembly jar. 
Re-attaching the stack trace for quick reference. scala val sqlContext = new 
org.apache.spark.sql.hive.HiveContext(sc) 

error: bad symbolic reference. A signature in HiveContext.class refers to term 
hive 
in package org.apache.hadoop which is not available. 
It may be completely missing from the current classpath, or the version on 
the classpath might be incompatible with the version used when compiling 
HiveContext.class. 
error: 
 while compiling: console
during phase: erasure 
 library version: version 2.10.4 
compiler version: version 2.10.4 
  reconstructed args: 

  last tree to typer: Apply(value $outer) 
  symbol: value $outer (flags: method synthetic stable 
expandedname triedcooking) 
   symbol definition: val $outer(): $iwC.$iwC.type 
 tpe: $iwC.$iwC.type 
   symbol owners: value $outer - class $iwC - class $iwC - class $iwC - 
class $read - package $line5 
  context owners: class $iwC - class $iwC - class $iwC - class $iwC - 
class $read - package $line5 

== Enclosing template or block == 

ClassDef( // class $iwC extends Serializable 
  0 
  $iwC 
  [] 
  Template( // val local $iwC: notype, tree.tpe=$iwC 
java.lang.Object, scala.Serializable // parents 
ValDef( 
  private 
  _ 
  tpt
  empty
) 
// 5 statements 
DefDef( // def init(arg$outer: $iwC.$iwC.$iwC.type): $iwC 
  method triedcooking
  init 
  [] 
  // 1 parameter list 
  ValDef( // $outer: $iwC.$iwC.$iwC.type 

$outer 
tpt // tree.tpe=$iwC.$iwC.$iwC.type 
empty
  ) 
  tpt // tree.tpe=$iwC 
  Block( // tree.tpe=Unit 
Apply( // def init(): Object in class Object, tree.tpe=Object 
  $iwC.super.init // def init(): Object in class Object, 
tree.tpe=()Object 
  Nil 
) 
() 
  ) 
) 
ValDef( // private[this] val sqlContext: 
org.apache.spark.sql.hive.HiveContext 
  private local triedcooking
  sqlContext  
  tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext 
  Apply( // def init(sc: org.apache.spark.SparkContext): 
org.apache.spark.sql.hive.HiveContext in class HiveContext, 
tree.tpe=org.apache.spark.sql.hive.HiveContext 
new org.apache.spark.sql.hive.HiveContext.init // def init(sc: 
org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class 
HiveContext, tree.tpe=(sc: 
org.apache.spark.SparkContext)org.apache.spark.sql.hive.HiveContext 
Apply( // val sc(): org.apache.spark.SparkContext, 
tree.tpe=org.apache.spark.SparkContext 
  
$iwC.this.$line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$$outer().$VAL1().$iw().$iw().sc
 // val sc(): org.apache.spark.SparkContext, 
tree.tpe=()org.apache.spark.SparkContext 
  Nil 
) 
  ) 
) 
DefDef( // val sqlContext(): org.apache.spark.sql.hive.HiveContext 
  method stable accessor
  sqlContext 
  [] 
  List(Nil) 
  tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext 
  $iwC.this.sqlContext  // private[this] val sqlContext: 
org.apache.spark.sql.hive.HiveContext, 
tree.tpe=org.apache.spark.sql.hive.HiveContext 
) 
ValDef( // protected val $outer: $iwC.$iwC.$iwC.type 
  protected synthetic paramaccessor triedcooking
  $outer  
  tpt // tree.tpe=$iwC.$iwC.$iwC.type 
  empty
) 
DefDef( // val $outer(): $iwC.$iwC.$iwC.type 
  method synthetic stable expandedname triedcooking
  $line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer 
  [] 
  List(Nil) 
  tpt // tree.tpe=Any 
  $iwC.this.$outer  // protected val $outer: $iwC.$iwC.$iwC.type, 
tree.tpe=$iwC.$iwC.$iwC.type 
) 
  ) 
) 

== Expanded type of tree == 

ThisType(class $iwC) 

uncaught exception during compilation: scala.reflect.internal.Types$TypeError 
scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in 
HiveContext.class refers to term conf 
in value org.apache.hadoop.hive which is not available. 
It may be completely missing from the current classpath, or the version on 
the classpath might be incompatible with the version used when compiling 
HiveContext.class. 
That entry seems to have slain the compiler.  Shall I replay 
your session? I can re-run each line except the last one. 
[y/n] 

 
Thanks
Tridib
 
 From: terry@smartfocus.com
 To: tridib.sama...@live.com; u...@spark.incubator.apache.org
 Subject: Re: Unable to use