I am new to spark.
Lets say i want to develop a machine learning model. which trained on normal
method in MLlib. I want to use that model with classifier Logistic
regression and predict the streaming data coming from a file or socket.
Streaming data - Logistic Regression - binary label
hey guys,
i was able to run the test just fine with:
$ sbt
project core
testOnly org.apache.spark.JavaAPISuite
however i found it strange that it didnt run when i do mvn test -pl core,
or at least didnt seem like it ran to me. this would mean that when someone
tests/publishes with maven the
Hello, you can have a look at this project hbase-rdd
https://github.com/unicredit/hbase-rdd that provides a simple method to
bulk load an rdd to HBase.
fralken
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Bulk-load-to-HBase-tp14667p20567.html
Sent
Looking at the pom.xml, I think I found the reason - scalatest is used.
With the following diff:
diff --git a/pom.xml b/pom.xml
index b7df53d..b0da893 100644
--- a/pom.xml
+++ b/pom.xml
@@ -947,7 +947,7 @@
version2.17/version
configuration
!-- Uses scalatest
Hi everyone,
I am new to Spark and encountered a problem.
I want to use an external library in a java project and compiling
works fine with maven, but during runtime (locally) I get a
NoClassDefFoundError.
Do I have to put the jars somewhere, or tell spark where they are?
I can send the pom.xml
See the following threads:
http://search-hadoop.com/m/JW1q5kjNlK
http://search-hadoop.com/m/JW1q5XqSDk
Cheers
On Sun, Dec 7, 2014 at 9:35 AM, Julius K fooliuskool...@gmail.com wrote:
Hi everyone,
I am new to Spark and encountered a problem.
I want to use an external library in a java
Hi,
When we try to call saveAsParquetFile on a schemaRDD we get the following error
:
Py4JJavaError: An error occurred while calling o384.saveAsParquetFile.
: java.lang.NoClassDefFoundError:
org/apache/hadoop/mapreduce/lib/output/DirectFileOutputCommitter
at
Hi Matt,That's what I'm seeing too. I've reverted to creating a fact in the
vagrantfile + adding host in puppet. Save's from having to have the vagrant
plugin installed. Vagrant-hosts looks interesting for scenarios where I control
all the machines.
Cheers,Ashic.
Subject: Re: Is there a way to
so as part of the official build the java api does not get tested then?
i am sure there is a good reason for it, but thats surprising to me.
On Sun, Dec 7, 2014 at 12:19 PM, Ted Yu yuzhih...@gmail.com wrote:
Looking at the pom.xml, I think I found the reason - scalatest is used.
With the
I think it's a known issue:
https://issues.apache.org/jira/browse/SPARK-4159
https://issues.apache.org/jira/browse/SPARK-661
I got bit by this too recently and meant to look into it.
On Sun, Dec 7, 2014 at 4:50 PM, Koert Kuipers ko...@tresata.com wrote:
so as part of the official build the
thanks. that makes sense. i searched the mailing list but couldnt find any
mention of it. i should have searched jira instead...
On Sun, Dec 7, 2014 at 6:25 PM, Sean Owen so...@cloudera.com wrote:
I think it's a known issue:
https://issues.apache.org/jira/browse/SPARK-4159
Hi,
thanks for your responses!
On Sat, Dec 6, 2014 at 4:22 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
What version are you using? In some recent versions, we had a couple of
large hardcoded sleeps on the Spark side.
I am using Spark 1.1.1.
As Andrew mentioned, I guess most of the 10
If anybody knows the reason, please help me. thanks a lot.
Thanks
Best Regard
LEO HU
CDSP
SAP LABS CHINA
From: Hu, Leo [mailto:leo.h...@sap.com]
Sent: Friday, December 05, 2014 10:23 AM
To: u...@spark.incubator.apache.org
Subject: spark assembly jar caused changed on src filesystem error
Hi all
there is a
*PartitionPruningRDD*
:: DeveloperApi :: A RDD used to prune RDD partitions/partitions so we can
avoid launching tasks on all partitions. An example use case: If we know
the RDD is partitioned by range, and the execution DAG has a filter on the
key, we can avoid launching tasks on
Hi,
I'm generating a Spark SQL table from an offline Json file.
The difficulty is, in the original json file, there is a hierarchical
structure. And, as a result, this is what I get:
scala tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp:
array (nullable = true) |
How can i print Node info. of Decision Tree model?
I want to navigate and print all information of Decision tree Model.
Is there some kind of function/method to support it?
--
View this message in context:
Hi,
I'm generating a Spark SQL table from an offline Json file.
The difficulty is, in the original json file, there is a hierarchical
structure. And, as a result, this is what I get:
scala tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp:
array (nullable = true) |
You may access it via something like |SELECT filterIp.element FROM tb|,
just like Hive. Or if you’re using Spark SQL DSL, you can use
|tb.select(filterIp.element.attr)|.
On 12/8/14 1:08 PM, Xuelin Cao wrote:
Hi,
I'm generating a Spark SQL table from an offline Json file.
The
Hello,
Are there ways we can programmatically get health status of master slave
nodes, similar to Hadoop Ambari?
Wiki seems to suggest there are only web UI or instrumentations
(http://spark.apache.org/docs/latest/monitoring.html).
Thanks,
Judy
Hi,
I have iplRDD which is a json, and I do below steps and query through
hivecontext. I get the results but without columns headers. Is there is a
way to get the columns names ?
val teamRDD = hiveContext.jsonRDD(iplRDD)
teamRDD.registerTempTable(teams)
hiveContext.cacheTable(teams)
val result
20 matches
Mail list logo