MLlib(Logistic Regression) + Spark Streaming.

2014-12-07 Thread Nasir Khan
I am new to spark. Lets say i want to develop a machine learning model. which trained on normal method in MLlib. I want to use that model with classifier Logistic regression and predict the streaming data coming from a file or socket. Streaming data - Logistic Regression - binary label

Re: run JavaAPISuite with mavem

2014-12-07 Thread Koert Kuipers
hey guys, i was able to run the test just fine with: $ sbt project core testOnly org.apache.spark.JavaAPISuite however i found it strange that it didnt run when i do mvn test -pl core, or at least didnt seem like it ran to me. this would mean that when someone tests/publishes with maven the

RE: Bulk-load to HBase

2014-12-07 Thread fralken
Hello, you can have a look at this project hbase-rdd https://github.com/unicredit/hbase-rdd that provides a simple method to bulk load an rdd to HBase. fralken -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bulk-load-to-HBase-tp14667p20567.html Sent

Re: run JavaAPISuite with mavem

2014-12-07 Thread Ted Yu
Looking at the pom.xml, I think I found the reason - scalatest is used. With the following diff: diff --git a/pom.xml b/pom.xml index b7df53d..b0da893 100644 --- a/pom.xml +++ b/pom.xml @@ -947,7 +947,7 @@ version2.17/version configuration !-- Uses scalatest

NoClassDefFoundError

2014-12-07 Thread Julius K
Hi everyone, I am new to Spark and encountered a problem. I want to use an external library in a java project and compiling works fine with maven, but during runtime (locally) I get a NoClassDefFoundError. Do I have to put the jars somewhere, or tell spark where they are? I can send the pom.xml

Re: NoClassDefFoundError

2014-12-07 Thread Ted Yu
See the following threads: http://search-hadoop.com/m/JW1q5kjNlK http://search-hadoop.com/m/JW1q5XqSDk Cheers On Sun, Dec 7, 2014 at 9:35 AM, Julius K fooliuskool...@gmail.com wrote: Hi everyone, I am new to Spark and encountered a problem. I want to use an external library in a java

saveAsParquetFile and DirectFileOutputCommitter Class not found Error

2014-12-07 Thread Addanki, Santosh Kumar
Hi, When we try to call saveAsParquetFile on a schemaRDD we get the following error : Py4JJavaError: An error occurred while calling o384.saveAsParquetFile. : java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/output/DirectFileOutputCommitter at

RE: Is there a way to force spark to use specific ips?

2014-12-07 Thread Ashic Mahtab
Hi Matt,That's what I'm seeing too. I've reverted to creating a fact in the vagrantfile + adding host in puppet. Save's from having to have the vagrant plugin installed. Vagrant-hosts looks interesting for scenarios where I control all the machines. Cheers,Ashic. Subject: Re: Is there a way to

Re: run JavaAPISuite with mavem

2014-12-07 Thread Koert Kuipers
so as part of the official build the java api does not get tested then? i am sure there is a good reason for it, but thats surprising to me. On Sun, Dec 7, 2014 at 12:19 PM, Ted Yu yuzhih...@gmail.com wrote: Looking at the pom.xml, I think I found the reason - scalatest is used. With the

Re: run JavaAPISuite with mavem

2014-12-07 Thread Sean Owen
I think it's a known issue: https://issues.apache.org/jira/browse/SPARK-4159 https://issues.apache.org/jira/browse/SPARK-661 I got bit by this too recently and meant to look into it. On Sun, Dec 7, 2014 at 4:50 PM, Koert Kuipers ko...@tresata.com wrote: so as part of the official build the

Re: run JavaAPISuite with mavem

2014-12-07 Thread Koert Kuipers
thanks. that makes sense. i searched the mailing list but couldnt find any mention of it. i should have searched jira instead... On Sun, Dec 7, 2014 at 6:25 PM, Sean Owen so...@cloudera.com wrote: I think it's a known issue: https://issues.apache.org/jira/browse/SPARK-4159

Re: spark-submit on YARN is slow

2014-12-07 Thread Tobias Pfeiffer
Hi, thanks for your responses! On Sat, Dec 6, 2014 at 4:22 AM, Sandy Ryza sandy.r...@cloudera.com wrote: What version are you using? In some recent versions, we had a couple of large hardcoded sleeps on the Spark side. I am using Spark 1.1.1. As Andrew mentioned, I guess most of the 10

RE: spark assembly jar caused changed on src filesystem error

2014-12-07 Thread Hu, Leo
If anybody knows the reason, please help me. thanks a lot. Thanks Best Regard LEO HU CDSP SAP LABS CHINA From: Hu, Leo [mailto:leo.h...@sap.com] Sent: Friday, December 05, 2014 10:23 AM To: u...@spark.incubator.apache.org Subject: spark assembly jar caused changed on src filesystem error Hi all

Re: Does filter on an RDD scan every data item ?

2014-12-07 Thread 诺铁
there is a *PartitionPruningRDD* :: DeveloperApi :: A RDD used to prune RDD partitions/partitions so we can avoid launching tasks on all partitions. An example use case: If we know the RDD is partitioned by range, and the execution DAG has a filter on the key, we can avoid launching tasks on

Spark SQL: How to get the hierarchical element with SQL?

2014-12-07 Thread Xuelin Cao
Hi,     I'm generating a Spark SQL table from an offline Json file.     The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get: scala tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) |    

Print Node info. of Decision Tree

2014-12-07 Thread jake Lim
How can i print Node info. of Decision Tree model? I want to navigate and print all information of Decision tree Model. Is there some kind of function/method to support it? -- View this message in context:

Spark SQL: How to get the hierarchical element with SQL?

2014-12-07 Thread Xuelin Cao
Hi,     I'm generating a Spark SQL table from an offline Json file.     The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get: scala tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) |    

Re: Spark SQL: How to get the hierarchical element with SQL?

2014-12-07 Thread Cheng Lian
You may access it via something like |SELECT filterIp.element FROM tb|, just like Hive. Or if you’re using Spark SQL DSL, you can use |tb.select(filterIp.element.attr)|. On 12/8/14 1:08 PM, Xuelin Cao wrote: Hi, I'm generating a Spark SQL table from an offline Json file. The

monitoring for spark standalone

2014-12-07 Thread Judy Nash
Hello, Are there ways we can programmatically get health status of master slave nodes, similar to Hadoop Ambari? Wiki seems to suggest there are only web UI or instrumentations (http://spark.apache.org/docs/latest/monitoring.html). Thanks, Judy

Is there a way to get column names using hiveContext ?

2014-12-07 Thread abhishek
Hi, I have iplRDD which is a json, and I do below steps and query through hivecontext. I get the results but without columns headers. Is there is a way to get the columns names ? val teamRDD = hiveContext.jsonRDD(iplRDD) teamRDD.registerTempTable(teams) hiveContext.cacheTable(teams) val result