Re: Spark 1.6.1 : SPARK-12089 : java.lang.NegativeArraySizeException

2016-03-13 Thread Ted Yu
Here is related code: final int length = totalSize() + neededSize; if (buffer.length < length) { // This will not happen frequently, because the buffer is re-used. final byte[] tmp = new byte[length * 2]; Looks like length was positive (since it was bigger than buffer.length)

spark streaming web ui remains completed jobs as active jobs

2016-03-13 Thread t7s
I am sure these job completed according to the log For example, job 9816 This is info about job 9816 in log: [2016-03-14 07:15:05,088] INFO Job 9816 finished: transform at

Spark 1.6.1 : SPARK-12089 : java.lang.NegativeArraySizeException

2016-03-13 Thread Ravindra Rawat
Greetings, I am getting following exception on joining a few parquet files. SPARK-12089 description has details of the overflow condition which is marked as fixed in 1.6.1. I recall seeing another issue related to csv files creating same exception. Any pointers on how to debug this or possible

spark streaming web ui remains completed jobs as active jobs

2016-03-13 Thread t7s
I am sure these job completed according to the log For example, job 9816 This is info about job 9816 in log: [2016-03-14 07:15:05,088] INFO Job 9816 finished: transform at

Hive Query on Spark fails with OOM

2016-03-13 Thread Prabhu Joseph
Hi All, A Hive Join query which runs fine and faster in MapReduce takes lot of time with Spark and finally fails with OOM. *Query: hivejoin.py* from pyspark import SparkContext, SparkConf from pyspark.sql import HiveContext conf = SparkConf().setAppName("Hive_Join") sc =

Re: append rows to dataframe

2016-03-13 Thread Ted Yu
dffiltered = unionAll(dfresult.filter(dfFilterSQLs( i)).select("Col1","Col2","Col3","Col4","Col5")) FYI On Sun, Mar 13, 2016 at 8:50 PM, Ted Yu wrote: > Have you tried unionAll() method of DataFrame ? > > On Sun, Mar 13, 2016 at 8:44 PM, Divya Gehlot

Can someone fix this download URL?

2016-03-13 Thread Akhil Das
http://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-bin-hadoop2.6.tgz [image: Inline image 1] There's a broken link for Spark 1.6.1 prebuilt hadoop 2.6 direct download. Thanks Best Regards

Re: append rows to dataframe

2016-03-13 Thread Ted Yu
Have you tried unionAll() method of DataFrame ? On Sun, Mar 13, 2016 at 8:44 PM, Divya Gehlot wrote: > Hi, > > Please bear me for asking such a naive question > I have list of conditions (dynamic sqls) sitting in hbase table . > I need to iterate through those dynamic

append rows to dataframe

2016-03-13 Thread Divya Gehlot
Hi, Please bear me for asking such a naive question I have list of conditions (dynamic sqls) sitting in hbase table . I need to iterate through those dynamic sqls and add the data to dataframes. As we know dataframes are immutable ,when I try to iterate in for loop as shown below I get only last

Re: Hive on Spark performance

2016-03-13 Thread Mich Talebzadeh
Depending on the version of Hive on Spark engine. As far as I am aware the latest version of Hive that I am using (Hive 2) has improvements compared to the previous versions of Hive (0.14,1.2.1) on Spark engine. As of today I have managed to use Hive 2.0 on Spark version 1.3.1. So it is not the

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Ted Yu
The backport would be done under HBASE-14160. FYI On Sun, Mar 13, 2016 at 4:14 PM, Benjamin Kim wrote: > Ted, > > Is there anything in the works or are there tasks already to do the > back-porting? > > Just curious. > > Thanks, > Ben > > On Mar 13, 2016, at 3:46 PM, Ted Yu

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
Ted, Is there anything in the works or are there tasks already to do the back-porting? Just curious. Thanks, Ben > On Mar 13, 2016, at 3:46 PM, Ted Yu wrote: > > class HFileWriterImpl (in standalone file) is only present in master branch. > It is not in branch-1. > >

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Ted Yu
class HFileWriterImpl (in standalone file) is only present in master branch. It is not in branch-1. compressionByName() resides in class with @InterfaceAudience.Private which got moved in master branch. So looks like there is some work to be done for backporting to branch-1 :-) On Sun, Mar 13,

Trying to serialize/deserialize Spark ML Pipeline (RandomForest) Spark 1.6

2016-03-13 Thread Mario Lazaro
Hi! I have a pipelineModel (use RandomForestClassifier) that I am trying to save locally. I can save it using: //save locally val fileOut = new FileOutputStream("file:///home/user/forest.model") val out = new ObjectOutputStream(fileOut) out.writeObject(model) out.close() fileOut.close() Then

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
Ted, I did as you said, but it looks like that HBaseContext relies on some differences in HBase itself. [ERROR] /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:30: error: object HFileWriterImpl is not a member of package

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
Ted, That’s great! I didn’t know. I will proceed with it as you said. Thanks, Ben > On Mar 13, 2016, at 12:42 PM, Ted Yu wrote: > > Benjamin: > Since hbase-spark is in its own module, you can pull the whole hbase-spark > subtree into hbase 1.0 root dir and add the

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Ted Yu
Benjamin: Since hbase-spark is in its own module, you can pull the whole hbase-spark subtree into hbase 1.0 root dir and add the following to root pom.xml: hbase-spark Then you would be able to build the module yourself. hbase-spark module uses APIs which are compatible with hbase 1.0

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
Hi Ted, I see that you’re working on the hbase-spark module for hbase. I recently packaged the SparkOnHBase project and gave it a test run. It works like a charm on CDH 5.4 and 5.5. All I had to do was add /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the classpath.txt

Re: Correct way to use spark streaming with apache zeppelin

2016-03-13 Thread Skanda
Hi Storing states/intermediate data in realtime processing depends on how much throughput/latency your application requires. There are lot of technologies that help you build this realtime datastore. Some examples include HBase, Memsql, etc or in some cases an RDBMS like MySQL itself. This is a

Re: Correct way to use spark streaming with apache zeppelin

2016-03-13 Thread trung kien
Thanks all for actively sharing your experience. @Chris: using something like Redis is something I am trying to figure out. I have a lots of transactions, so I couldn't trigger update event for every single transaction. I'm looking at Spark Streaming because it provide batch processing (e.g I

Save the model produced after training with ALS.

2016-03-13 Thread Shishir Anshuman
hello, I am using the sample code for ALS algorithm implementation. I want to save the model produced after training in a separate file. The 'modelPath' in model.save() stores some metadata. I am new to Apache spark, please

Re: Strange behavior of collectNeighbors API in GraphX

2016-03-13 Thread Zhaokang Wang
After further debugging into this issue, I find that this bug is related to the triplets view of a graph in GraphX. If a graph is generated by outer join two other graphs via outerJoinVertices operation, the graph's triplets view and the vertices view may be inconsistent. In the following

Re: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-13 Thread Mukul Gupta
Sorry for the late reply. I am new to Java and it took me a while to set things up. Yes, you are correct that kafka client libs need not be specifically added. I didn't realized that . I removed the same and code still compiled. However, upon execution, I encountered the same issue as before.

Re: Spark SQL is not returning records for HIVE transactional tables on HDP

2016-03-13 Thread @Sanjiv Singh
Hi All, We are using for Spark SQL : - Hive :1.2.1 - Spark : 1.3.1 - Hadoop :2.7.1 Let me know if needs other details to debug the issue. Regards Sanjiv Singh Mob : +091 9990-447-339 On Sun, Mar 13, 2016 at 1:07 AM, Mich Talebzadeh wrote: > Hi, > >

Re: Correct way to use spark streaming with apache zeppelin

2016-03-13 Thread Chris Miller
Cool! Thanks for sharing. -- Chris Miller On Sun, Mar 13, 2016 at 12:53 AM, Todd Nist wrote: > Below is a link to an example which Silvio Fiorito put together > demonstrating how to link Zeppelin with Spark Stream for real-time charts. > I think the original thread was