Bug of PolynomialExpansion ?

2016-05-29 Thread Jeff Zhang
I use PolynomialExpansion to convert one vector to 2-degree vector. I am confused about the result of following. As my understanding, the 2 degrees vector should contain 4 1's, not sure how the 5 1's come from. I think it supposed to be (x1,x2,x3) *(x1,x2,x3) = (x1*x1, x1*x2, x1*x3, x2*x1,x2*x2,

Re: GraphX Java API

2016-05-29 Thread Takeshi Yamamuro
These package are used only for Scala. On Mon, May 30, 2016 at 2:23 PM, Kumar, Abhishek (US - Bengaluru) < abhishekkuma...@deloitte.com> wrote: > Hey, > > · I see some graphx packages listed here: > > http://spark.apache.org/docs/latest/api/java/index.html > > ·

RE: GraphX Java API

2016-05-29 Thread Kumar, Abhishek (US - Bengaluru)
Hey, · I see some graphx packages listed here: http://spark.apache.org/docs/latest/api/java/index.html · org.apache.spark.graphx ·

Preview release of Spark 2.0

2016-05-29 Thread charles li
Here is the link: http://spark.apache.org/news/spark-2.0.0-preview.html congrats, haha, looking forward to 2.0.1, awesome project. -- *--* a spark lover, a quant, a developer and a good man. http://github.com/litaotao

Re: 答复: G1 GC takes too much time

2016-05-29 Thread Ted Yu
Please consider reading G1GC tuning guide(s). Here is an example: http://product.hubspot.com/blog/g1gc-tuning-your-hbase-cluster On Sun, May 29, 2016 at 7:17 PM, condor join wrote: > The follwing are the parameters: > -XX:+UseG1GC > -XX:+UnlockDiagnostivVMOptions >

??????????: G1 GC takes too much time

2016-05-29 Thread Sea
Yes, It seems like that CMS is better. I have tried G1 as databricks' blog recommended, but it's too slow. -- -- ??: "condor join";; : 2016??5??30??(??) 10:17 ??: "Ted Yu"; :

G1 GC takes too much time

2016-05-29 Thread condor join
Hi, my spark application failed due to take too much time during GC. Looking at the logs I found these things: 1.there are Young GC takes too much time,and not found Full GC happen this; 2.The time takes too much during the object copy; 3.It happened more easily when there were not enough

Re: Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread Nirav Patel
Sure let me can try that. But from looks of it it seems kryo kryo. util.MapReferenceResolver.getReadObject trying to access incorrect index (100) On Sun, May 29, 2016 at 5:06 PM, Ted Yu wrote: > Can you register Put with Kryo ? > > Thanks > > On May 29, 2016, at 4:58 PM,

Re: Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread Ted Yu
Can you register Put with Kryo ? Thanks > On May 29, 2016, at 4:58 PM, Nirav Patel wrote: > > I pasted code snipped for that method. > > here's full def: > > def writeRddToHBase2(hbaseRdd: RDD[(ImmutableBytesWritable, Put)], > tableName: String) { > > > >

Re: Accessing s3a files from Spark

2016-05-29 Thread Mayuresh Kunjir
On Sun, May 29, 2016 at 7:49 PM, Ted Yu wrote: > Have you seen this thread ? > > > http://search-hadoop.com/m/q3RTthWU8o1MbFC2=Re+Forbidded+Error+Code+403 > > ​ Thanks for the pointer. I have followed the thread, got no success though. I am trying out the Spark branch

Re: Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread Nirav Patel
I pasted code snipped for that method. here's full def: def writeRddToHBase2(hbaseRdd: RDD[(ImmutableBytesWritable, Put)], tableName: String) { hbaseRdd.values.foreachPartition{ itr => val hConf = HBaseConfiguration.create() hConf.setInt("hbase.client.write.buffer",

Re: Accessing s3a files from Spark

2016-05-29 Thread Ted Yu
Have you seen this thread ? http://search-hadoop.com/m/q3RTthWU8o1MbFC2=Re+Forbidded+Error+Code+403 On Sun, May 29, 2016 at 2:55 PM, Mayuresh Kunjir wrote: > I'm running into permission issues while accessing data in S3 bucket > stored using s3a file system from a local

Re: Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread Ted Yu
bq. at com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$ anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80) Can you reveal related code from HbaseUtils.scala ? Which hbase version are you using ? Thanks On Sun, May 29, 2016 at 4:26 PM, Nirav Patel wrote: > Hi, >

Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread Nirav Patel
Hi, I am getting following Kryo deserialization error when trying to buklload Cached RDD into Hbase. It works if I don't cache the RDD. I cache it with MEMORY_ONLY_SER. here's the code snippet: hbaseRdd.values.foreachPartition{ itr => val hConf = HBaseConfiguration.create()

Re: Multinomial regression with spark.ml version of LogisticRegression

2016-05-29 Thread Stephen Boesch
Thanks Bryan for that pointer : I will follow it. In the meantime the One vs Rest appears to satisfy the requirements. 2016-05-29 15:40 GMT-07:00 Bryan Cutler : > This is currently being worked on, planned for 2.1 I believe > https://issues.apache.org/jira/browse/SPARK-7159 >

Re: Multinomial regression with spark.ml version of LogisticRegression

2016-05-29 Thread Bryan Cutler
This is currently being worked on, planned for 2.1 I believe https://issues.apache.org/jira/browse/SPARK-7159 On May 28, 2016 9:31 PM, "Stephen Boesch" wrote: > Thanks Phuong But the point of my post is how to achieve without using > the deprecated the mllib pacakge. The

Accessing s3a files from Spark

2016-05-29 Thread Mayuresh Kunjir
I'm running into permission issues while accessing data in S3 bucket stored using s3a file system from a local Spark cluster. Has anyone found success with this? My setup is: - Spark 1.6.1 compiled against Hadoop 2.7.2 - aws-java-sdk-1.7.4.jar and hadoop-aws-2.7.2.jar in the classpath - Spark's

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Mich Talebzadeh
thanks I think the problem is that the TEZ user group is exceptionally quiet. Just sent an email to Hive user group to see anyone has managed to built a vendor independent version. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Jörn Franke
Well I think it is different from MR. It has some optimizations which you do not find in MR. Especially the LLAP option in Hive2 makes it interesting. I think hive 1.2 works with 0.7 and 2.0 with 0.8 . At least for 1.2 it is integrated in the Hortonworks distribution. > On 29 May 2016, at

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Mich Talebzadeh
Hi Jorn, I started building apache-tez-0.8.2 but got few errors. Couple of guys from TEZ user group kindly gave a hand but I could not go very far (or may be I did not make enough efforts) making it work. That TEZ user group is very quiet as well. My understanding is TEZ is MR with DAG but of

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Jörn Franke
Very interesting do you plan also a test with TEZ? > On 29 May 2016, at 13:40, Mich Talebzadeh wrote: > > Hi, > > I did another study of Hive using Spark engine compared to Hive with MR. > > Basically took the original table imported using Sqoop and created and >

Re: GraphX Java API

2016-05-29 Thread Jules Damji
Also, this blog talks about GraphsFrames implementation of some GraphX algorithms, accessible from Java, Scala, and Python https://databricks.com/blog/2016/03/03/introducing-graphframes.html Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On May 29, 2016, at 12:24 AM,

Re: GraphX Java API

2016-05-29 Thread Takeshi Yamamuro
Hi, Have you checked GraphFrame? See the related discussion: See https://issues.apache.org/jira/browse/SPARK-3665 // maropu On Fri, May 27, 2016 at 8:22 PM, Santoshakhilesh < santosh.akhil...@huawei.com> wrote: > GraphX APis are available only in Scala. If you need to use GraphX you > need to