Re: Error getting response from spark driver rest APIs : java.lang.IncompatibleClassChangeError: Implementing class

2015-12-26 Thread Hokam Singh Chauhan
Hi Rakesh, Looks like the old version of jersey is going with shaded jar. Add the below dependencies in your shaded jar, It will resolve the *InvocationTargetException *issue. jersey-client-1.9 jersey-core-1.9 jersey-json-1.9 jersey-grizzly2-1.9 jersey-guice-1.9 jersey-server-1.9 Regards, Hokam

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-26 Thread Bryan
Vivek, Where you’re using numThreads – look at the documentation for createStream. I believe that number should be the number of partitions to consume. Sent from Outlook Mail for Windows 10 phone From: vivek.meghanat...@wipro.com Sent: Friday, December 25, 2015 11:39 PM To:

Re: Problem using limit clause in spark sql

2015-12-26 Thread tiandiwoxin1234
As for 'rdd.zipwithIndex.partitionBy(YourCustomPartitioner)', can I just drop some records using my custom partitioner, otherwise I still have to call rdd.take() to get exactly 1 records. And repartition is THE expensive operation that I want to walk around. Actually, what I expect the

Re: REST Api not working in spark

2015-12-26 Thread Hokam Singh Chauhan
Hi Aman, Looks like the old version of jersey is going with shaded jar. Add the below jersey dependencies in your shaded jar, It will resolve the *InvocationTargetException *issue. jersey-client-1.9 jersey-core-1.9 jersey-json-1.9 jersey-grizzly2-1.9 jersey-guice-1.9 jersey-server-1.9 Regards,

Re: why one of Stage is into Skipped section instead of Completed

2015-12-26 Thread Silvio Fiorito
Skipped stages result from existing shuffle output of a stage when re-running a transformation. The executors will have the output of the stage in their local dirs and Spark recognizes that, so rather than re-computing, it will start from the following stage. So, this is a good thing in that

Cassandra read throughput using DataStax connector in Spark

2015-12-26 Thread Noorul Islam Kamal Malmiyoda
Hello all, I am using DataStax connector to read data from Cassandra and write to another Cassandra cluster. Infra is Amazon. I have three nodes cluster with replication factor of 3 on both clusters. But the throughput seems to be very low. It takes 7 minutes to transfer around 2.5 GB/node. I

Re: Stuck with DataFrame df.select("select * from table");

2015-12-26 Thread Eugene Morozov
Chris, thanks. That'd be great to try =) -- Be well! Jean Morozov On Fri, Dec 25, 2015 at 10:50 PM, Chris Fregly wrote: > oh, and it's worth noting that - starting with Spark 1.6 - you'll be able > to just do the following: > > SELECT * FROM json.`/path/to/json/file` > >

Re: REST Api not working in spark

2015-12-26 Thread vivek.meghanathan
Which JRE version you are using? One possibility is you are running a lover version of JRE than it required. Regards Vivek Sent using CloudMagic Email On Fri, Dec 25, 2015 at 4:13 pm, aman solanki

1.5.2 prebuilt for 2.4 spark-submit standalone Python scripts not running

2015-12-26 Thread peteranolaN
Hi all, Question from a newbie here about your excellent Spark: I've just installed Spark 1.5.2, pre-built for Hadoop 2.4 and later. I'm trying to go through the introductory documentation using local[4] to begin with. In pyspark, I'm able to use examples such as the simple application at

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-26 Thread Benjamin Kim
Chris, I have a question about your setup. Does it allow the same usage of Cassandra/HBase data sources? Can I create a table that links to and be used by Spark SQL? The reason for asking is that I see the Cassandra connector package included in your script. Thanks, Ben > On Dec 25, 2015, at

Re: How to handle categorical variables in Spark MLlib?

2015-12-26 Thread robert_dodier
hokam chauhan wrote > So how the string value of categorical variable can be converted into > double values for forming the features vector ? Well, the key characteristic of the variables is that their values are not ordered. So the representation you choose has to honor that. If the model is

Re: ERROR server.TThreadPoolServer: Error occurred during processing of message

2015-12-26 Thread Dasun Hegoda
Yes, didn't work for me On Sun, Dec 27, 2015 at 10:56 AM, Ted Yu wrote: > Have you seen this ? > > > http://stackoverflow.com/questions/30705576/python-cannot-connect-hiveserver2 > > On Sat, Dec 26, 2015 at 9:09 PM, Dasun Hegoda > wrote: > >> I'm

ERROR server.TThreadPoolServer: Error occurred during processing of message

2015-12-26 Thread Dasun Hegoda
I'm running apache-hive-1.2.1-bin and spark-1.5.1-bin-hadoop2.6. spark as the hive engine. When I try to connect through JasperStudio using thrift port I get below error. I'm running ubuntu 14.04. 15/12/26 23:36:20 ERROR server.TThreadPoolServer: Error occurred during processing of message.

Re: ERROR server.TThreadPoolServer: Error occurred during processing of message

2015-12-26 Thread Ted Yu
Have you seen this ? http://stackoverflow.com/questions/30705576/python-cannot-connect-hiveserver2 On Sat, Dec 26, 2015 at 9:09 PM, Dasun Hegoda wrote: > I'm running apache-hive-1.2.1-bin and spark-1.5.1-bin-hadoop2.6. spark as > the hive engine. When I try to connect

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-26 Thread vivek.meghanathan
Hi Bryan, Yes we are using only 1 thread per topic as we have only one Kafka server with 1 partition. What kind of logs will tell us what offset spark stream is reading from Kafka or is it resetting something without reading? Regards Vivek Sent using CloudMagic

Re: ERROR server.TThreadPoolServer: Error occurred during processing of message

2015-12-26 Thread Dasun Hegoda
I was able to figure out where the problem is exactly. It's spark. because when I start the hiveserver2 manually and run query it work fine. but when I try to access the hive through spark's thrift port it does not work. throws the above mentioned error. Please help me to fix this. On Sun, Dec