Damn, you're right, I wasn't looking at it properly. I was confused by
intelliJ I guess.
Many thanks!
On 2014-10-02 19:02, Sean Owen wrote:
Eh, is it not that you are mapping the values of an RDD whose keys are
StringStrings, but expecting the keys are Strings? That's also about
what the
I used to face this while running it on a single node machine and when i
allocate more memory for the executor. (ie, my machine was 28Gb memory and
i allocated 26Gb for the executor, dropping the memory from 26 to 20Gb
solved my issue.). If you are seeing an executor lost exception then you
can
What is your cluster setup? and how much memory are you allocating to the
executor?
Thanks
Best Regards
On Fri, Oct 3, 2014 at 7:52 AM, jamborta jambo...@gmail.com wrote:
Hi Arun,
Have you found a solution? Seems that I have the same problem.
thanks,
--
View this message in context:
Hi all, I tried to launch my application with spark-submit, the command I
use is:
bin/spark-submit --class ${MY_CLASS} --jars ${MY_JARS} --master local
myApplicationJar.jar
I've buillt spark with SPARK_HIVE=true, and was able to start HiveContext,
and was able to run command like,
Are you running master? There was briefly a regression here that is
hopefully fixed by spark#2635 https://github.com/apache/spark/pull/2635.
On Fri, Oct 3, 2014 at 1:43 AM, Kevin Paul kevinpaulap...@gmail.com wrote:
Hi all, I tried to launch my application with spark-submit, the command I
use
Hi,
I installed hbase-0.98.6-hadoop2. It's working not any problem with that.
When i am try to run spark hbase python examples, (wordcount examples
working - not python issue)
./bin/spark-submit --master local --driver-class-path
./examples/target/spark-examples_2.10-1.1.0.jar
Often java.lang.NoSuchMethodError means that you have more than one version
of a library on your classpath, in this case it looks like hive.
On Thu, Oct 2, 2014 at 8:44 PM, Li HM hmx...@gmail.com wrote:
I have rebuild package with -Phive
Copied hive-site.xml to conf (I am using hive-0.12)
Hi,
I installed hbase-0.98.6-hadoop2. It's working not any problem with that.
When i am try to run spark hbase python examples, (wordcount examples
working - not python issue)
./bin/spark-submit --master local --driver-class-path
./examples/target/spark-examples_2.10-1.1.0.jar
Current approach is to use mappartition, initialize the connection in the
beginning, iterate through the data close off the connector.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Fri, Oct 3, 2014 at 10:16 AM, Stephen
Also make sure to call |hiveContext.sql| within the same thread where
|hiveContext| is created, because Hive uses thread-local variable to
initialize the |Driver.conf|.
On 10/3/14 4:52 PM, Michael Armbrust wrote:
Are you running master? There was briefly a regression here that is
hopefully
Hi,
I have set up Spark 0.9.2 standalone cluster using CDH5 and pre-built
spark distribution archive for Hadoop 2. I was not using spark-ec2
scripts because I am not on EC2 cloud.
Spark-shell seems to be working properly -- I am able to perform simple
RDD operations, as well as e.g. SparkPi
I have two nodes with 96G ram 16 cores, my setup is as follows:
conf = (SparkConf()
.setMaster(yarn-cluster)
.set(spark.executor.memory, 30G)
.set(spark.cores.max, 32)
.set(spark.executor.instances, 2)
.set(spark.executor.cores, 8)
Hi,
I am quite a new user of spark, and I have a stupid question about mount
ephemeral disk for AWS EC2.
If I well understand the spark_ec.py script, it is spark-ec2/setup-slave.sh
that mounts the ephemeral disk for AWS EC2(Instance Store Volumes). However,
in setup-slave.sh, it seems that these
Hi,
I have set up Spark 1.0.2 on the cluster using standalone mode and the input is
managed by HDFS. One node of the cluster has Intel Xeon Phi 5110P coprocessor.
Is there any possibility that spark could be aware of Phi and run job on Xeon
Phi? Do I have to modify the code of scheduler?
What are the specific features of intel Xeon Phi that can be utilized by
Spark?
2014-10-03 18:09 GMT+08:00 余 浪 yulan...@gmail.com:
Hi,
I have set up Spark 1.0.2 on the cluster using standalone mode and the
input is managed by HDFS. One node of the cluster has Intel Xeon Phi 5110P
Hi Team,
When I am trying to use DenseMatrix of breeze library in spark, its
throwing me the following error:
java.lang.noclassdeffounderror: breeze/storage/Zero
Can someone help me on this ?
Thanks,
Padma Ch
Hi,
How can the spark log be saved into file instead of showing them on console?
Below is my conf/log4j.properties
conf/log4j.properties
###
# Root logger option
log4j.rootLogger=INFO, file
# Direct log messages to a log file
log4j.appender.file=org.apache.log4j.RollingFileAppender
#Redirect
digging a bit deeper on, the executors get lost when the memory gets close to
the physical memory size:
http://apache-spark-user-list.1001560.n3.nabble.com/file/n15680/memory_usage.png
I'm not clear if I am allocating too much, or too less memory in this case.
thanks,
--
View this message
Yes, though it's a little more complex than that:
http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAPH-c_O9kQO6yJ4khXUVdO=+D4vj=JfG2tP9eqn5RPko=dr...@mail.gmail.com%3E
On Fri, Oct 3, 2014 at 9:58 AM, Mayur Rustagi mayur.rust...@gmail.com wrote:
Current approach is to use
cool thanks will set this up and report back how things wentregardssanjay
From: Daniel Siegmann daniel.siegm...@velos.io
To: Ashish Jain ashish@gmail.com
Cc: Sanjay Subramanian sanjaysubraman...@yahoo.com; user@spark.apache.org
user@spark.apache.org
Sent: Thursday, October 2, 2014
For intelliJ + SBT, also you can follow the directions
http://jayunit100.blogspot.com/2014/07/set-up-spark-application-devleopment.html
. ITs really easy to run spark in an IDE . The process for eclipse is
virtually identical.
On Fri, Oct 3, 2014 at 10:03 AM, Sanjay Subramanian
Hi ssimanta,
were you able to resolve the problem with failing standalone scala program,
but spark repl working just fine? I am getting the same issue...
Thanks,
Irina
--
View this message in context:
Just getting started with Spark, so hopefully this is all there and I just
haven't found it yet.
I have a driver pgm on my client machine, I can use addFiles to distribute
files to the remote
worker nodes of the cluster. They are there to be found by my code running in
the executors,
so al is
Thanks -- it does appear that I misdiagnosed a bit: case works generally
but it doesn't seem to like the bit operation, which does not seem to work
(type of bit_field in Hive is bigint):
Error: java.lang.RuntimeException:
Unsupported language features in query: select (case when bit_field
1=1
when you're running spark-shell and the example, are you actually
specifying --master spark://master:7077 as shown here:
http://spark.apache.org/docs/latest/programming-guide.html#initializing-spark
because if you're not, your spark-shell is running in local mode and not
actually connecting to
Hello Spark Gurus,
I am trying to learn Spark. I am specially interested in GraphX.
Since Spark can used in streaming context as well, I wanted to know
whether it is possible to use the Spark Toolkits like GraphX or MLlib
in the streaming context?
Apologies if this is a stupid question but I am
Did you add a different version of breeze to the classpath? In Spark
1.0, we use breeze 0.7, and in Spark 1.1 we use 0.9. If the breeze
version you used is different from the one comes with Spark, you might
see class not found. -Xiangrui
On Fri, Oct 3, 2014 at 4:22 AM, Priya Ch
Thanks for your explanation.
From: Cheng Lian lian.cs@gmail.commailto:lian.cs@gmail.com
Date: Thursday, October 2, 2014 at 8:01 PM
To: Du Li l...@yahoo-inc.com.INVALIDmailto:l...@yahoo-inc.com.INVALID,
d...@spark.apache.orgmailto:d...@spark.apache.org
I was able to run collaborative filtering with low rank numbers, like 20~160
on the netflix dataset, but it fails due to the following error when I set
the rank to 1000:
14/10/03 03:27:36 WARN TaskSetManager: Loss was due to
java.lang.IllegalArgumentException
java.lang.IllegalArgumentException:
Hi All,
An year ago we started this journey and laid the path for Spark + Cassandra
stack. We established the ground work and direction for Spark Cassandra
connectors and we have been happy seeing the results.
With Spark 1.1.0 and SparkSQL release, we its time to take Calliope
Hi,
Sorry I am not very familiar with Java. I found that if I set the RDD
partition number to be higher, I meet this error
messagejava.lang.OutOfMemoryError: Requested array size exceeds VM limit;
however if I set the RDD partition number to be lower, the error is gone.
My aws ec2 cluster has 72
The current impl of ALS constructs least squares subproblems in
memory. So for rank 100, the total memory it requires is about 480,189
* 100^2 / 2 * 8 bytes ~ 20GB, divided by the number of blocks. For
rank 1000, this number goes up to 2TB, unfortunately. There is a JIRA
for optimizing ALS:
Yana, many thanks for looking into this!
I am not running spark-shell in local mode, I am really starting
spark-shell with --master spark://master:7077 and run in cluster mode.
Second thing is I tried to set spark.driver.host to master both in
scala app when creating context, and in
Thanks, Xiangrui.
I didn't check the test error yet. I agree that rank 1000 might overfit for
this particular dataset. Currently I'm just running some scalability tests -
I'm trying to see how large the model can be scaled to given a fixed amount
of hardware.
--
View this message in context:
Thanks for digging in! These both look like they should have JIRAs.
On Fri, Oct 3, 2014 at 8:14 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Thanks -- it does appear that I misdiagnosed a bit: case works generally
but it doesn't seem to like the bit operation, which does not seem to work
Hi,
I also have a use for count-based windowing. I'd like to process data
batches by size as opposed to time. Is this feature on the development
roadmap? Is there a JIRA ticket for it?
Thank you,
Michael
--
View this message in context:
I don't think it's a red herring... (btw. spark.driver.host needs to be set
to the IP or FQDN of the machine where you're running the program).
I am running 0.9.2 on CDH4 and the beginning of my executor log looks like
below (I've obfuscated the IP -- this is the log from executor
Thanks Matei, will check out the MLLib implementation.
On Wed, Oct 1, 2014 at 2:24 PM, Andy Twigg andy.tw...@gmail.com wrote:
Yes, that makes sense. It's similar to the all reduce pattern in vw.
On Wednesday, 1 October 2014, Matei Zaharia matei.zaha...@gmail.com
wrote:
Some of the MLlib
I have taken a look at the code of mesos spark-ec2 and documentation of AWS.
I think that maybe I found the answer.
In fact, there are two types AMI in AWS EBS backed AMI and instance store
backed AMI. For EBS backed AMI, we can add instance store volume when we
create the images(The details can
Maybe you can follow the instruction in this link
https://github.com/mesos/spark-ec2/tree/v3/ganglia
https://github.com/mesos/spark-ec2/tree/v3/ganglia . For me it works well
--
View this message in context:
Hi everyone,
What is the state of affairs w.r.t python 3? Is this still post still a
good description of the situation?
https://groups.google.com/forum/#!topic/spark-users/GRKmVo0ZDBc
Thanks!
Ariel
According to the official site of spark, for the latest version of
spark(1.1.0), it does not work with python 3
Spark 1.1.0 works with Python 2.6 or higher (but not Python 3). It uses the
standard CPython interpreter, so C libraries like NumPy can be used.
--
View this message in context:
Hi all,
We're running Spark 1.0 on CDH 5.1.2. We're using Spark in YARN-client
mode.
We're seeing that one of our nodes is not being assigned any tasks, and no
resources (RAM,cpu) are being used on this node. In the CM UI this worker
node is in good health and the spark Worker process is
This is my SPARK_CLASSPATH after cleanup
SPARK_CLASSPATH=/home/test/lib/hcatalog-core.jar:$SPARK_CLASSPATH
now use mydb works.
but show tables and select * from test still gives exception:
spark-sql show tables;
OK
java.io.IOException: java.io.IOException: Cannot create an instance of
Maybe I am wrong, but how many resource that a spark application can use
depends on the mode of deployment(the type of resource manager), you can
take a look at https://spark.apache.org/docs/latest/job-scheduling.html
https://spark.apache.org/docs/latest/job-scheduling.html .
For your case, I
Why are you including hcatalog-core.jar? That is probably causing the
issues.
On Fri, Oct 3, 2014 at 3:03 PM, Li HM hmx...@gmail.com wrote:
This is my SPARK_CLASSPATH after cleanup
SPARK_CLASSPATH=/home/test/lib/hcatalog-core.jar:$SPARK_CLASSPATH
now use mydb works.
but show tables and
Any idea why my email was returned with the following error message?
Thanks
Andy
This is the mail system at host smtprelay06.hostedemail.com.
I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.
For further assistance,
I notice that accumulators register themselves with a private Accumulators
object.
I don't notice any way to unregister them when one is done.
Am I missing something? If not, is there any plan for how to free up that
memory?
I've a case where we're gathering data from repeated queries using
Hi, Going through spark mllib doc I have noticed that it supports multiclass
classification can any body help me in implementing multilabel
classification on spark like in Mulan
http://mulan.sourceforge.net/index.html and Meka
http://meka.sourceforge.net/ libraries.
--
View this message
It would be really helpful if you can help test the scalability of the
new ALS impl:
https://github.com/mengxr/spark-als/blob/master/src/main/scala/org/apache/spark/ml/SimpleALS.scala
. It should be faster and more scalable, but the code is messy now.
Best,
Xiangrui
On Fri, Oct 3, 2014 at 11:57
If I don't have that jar, I am getting the following error:
xception in thread main java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassNotFoundException:
org.apache.hcatalog.security.HdfsAuthorizationProvider
at
No, it is hive 0.12.4.
Let me try your suggestion. It is an existing hive db. I am using the original
hive-site.xml as is.
Sent from my iPhone
On Oct 3, 2014, at 5:02 PM, Edwin Chiu edwin.c...@manage.com wrote:
Are you using hive 0.13?
Switching back to HadoopDefaultAuthenticator in
Hi All,
Would really appreciate if someone in the community can help me with this.
I have a simple Java spark streaming application - NetworkWordCount
SparkConf sparkConf = new
SparkConf().setMaster(yarn-cluster).setAppName(Streaming WordCount);
JavaStreamingContext jssc = new
So some progress but still errors
object WordCount { def main(args: Array[String]) { if (args.length 1) {
System.err.println(Usage: WordCount file) System.exit(1) }
val conf = new SparkConf().setMaster(local).setAppName(sWhatever) val sc
= new SparkContext(conf);
Hi,
I prefer that PySpark can also be executed on Python 3.
Do you have some reason or demand to use PySpark through Python3?
If you create an issue on JIRA, I would try to resolve it.
On 4 October 2014 06:47, Gen gen.tan...@gmail.com wrote:
According to the official site of spark, for the
It would be great if we supported Python 3 and I'd be happy to review any
pull requests to add it. I don't know that Python 3 is very widely-used,
but I'm open to supporting it if it won't require too much work.
By the way, we recently added support for PyPy:
Hi,
I am using a library that parses Ais Messages. My code which follows the
simple steps gives me null values in Date field.
1) Get the message from file.
2) parse the message.
3) map the message RDD to only keep the (Date, SomeInfo)
4) take top 100 element.
Result = the Date field appears fine
Given an RDD with multiple lines of the form:
u'207.86.121.131 207.86.121.131 2012-11-27 13:02:17 titlestring 622592 27
184464'
(fields are separated by a )
What pyspark function/commands do I use to filter out those lines where
line[8] = x? (i.e line[8] = 125)
when I use line.split( ) I get
Correction to my question. (5) should read
5) save the tuple RDD(created at step 3) to HDFS using SaveAsTextFile.
Can someone please guide me in the right direction?
Thanks in advance
Manas
-
Manas Kar
--
View this message in context:
Correction to my question. (5) should read
5) save the tuple RDD(created at step 3) to HDFS using SaveAsTextFile.
Can someone please guide me in the right direction?
Thanks in advance
Manas
On Fri, Oct 3, 2014 at 11:42 PM, manasdebashiskar [via Apache Spark User
List]
It won't work with valueorg.apache.hadoop.hive.ql.security.
HadoopDefaultAuthenticator/value.
Just wonder how and why it works with you guys.
Here is the new error:
Exception in thread main java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
If I change it to
valueorg.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider/value
The error becomes:
Exception in thread main java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.RuntimeException: java.lang.NoSuchMethodException:
You dont have to include breeze jar which is already in spark assembly jar.
For native one, its optional.
Sent from my Google Nexus 5
On Oct 3, 2014 8:04 PM, Priya Ch learnings.chitt...@gmail.com wrote:
yes. I have included breeze-0.9 in build.sbt file. I ll change this to
0.7. Apart from
rdd.filter(lambda line: int(line.split(' ')[8]) = 125)
On Fri, Oct 3, 2014 at 8:16 PM, Chop thomrog...@att.net wrote:
Given an RDD with multiple lines of the form:
u'207.86.121.131 207.86.121.131 2012-11-27 13:02:17 titlestring 622592 27
184464'
(fields are separated by a )
What pyspark
64 matches
Mail list logo