Hi everyone,
I am using Spark 1.0.0 and I am facing some issues with handling binary
snappy compressed avro files which I get form HDFS. I know there are
improved mechanisms to handle these files on more recent version of Spark,
but updating is not an option since I am operating on a Cloudera
Hi,
BindException comes when two processes are using the same port. In your
spark configuration just set (spark.ui.port,x),
to some other port. x can be any number say 12345. BindException will
not break your job in either case. Just to fix it change the port number.
Thanks.
On Fri, Nov
Any suggestions to address the described problem? In particular, it appears
that considering the skewed degree of some of the item nodes in the graph,
I believe it should be possible to define better block sizes to reflect
that fact, but am unsure of the way of arriving at the sizes accordingly.
Debasish Das wrote
For spark-shell my assumption is spark-shell -cp option should work
fine
Thanks for the suggestion, but this doesn't work. I tried:
./bin/spark-shell -cp commons-math3-3.2.jar -usejavacp
(apparently -cp is deprecated for the scala shell as of 2.8, so -usejavacp
is
Hi,I have a similar problem.I modified the code in mllib and examples.I did mvn
install -pl mllib mvn install -pl examples
But when I run the program in examples using run-example,the older version of
mllib (before the changes were made) is getting executed.How to get the changes
made in mllib
Are there any news about this issue? I have checked again maven central and
the artefacts are still not there.
Regards,
Luis
2014-11-27 10:42 GMT+00:00 Luis Ángel Vicente Sánchez
langel.gro...@gmail.com:
I have just read on the website that spark 1.1.1 has been released but
when I upgraded
Hi,
so you know, I added PMML export for linear models (linear, ridge and lasso)
as suggested by Xiangrui.
I will be looking at SVMs and Logistic regression next.
Vincenzo
--
View this message in context:
Are there any news about this issue? I was using a local folder in linux
for checkpointing, file:///opt/sparkfolders/checkpoints. I think that
being able to use the ReliableKafkaReceiver in a 24x7 system without having
to worry about disk getting full is a reasonable expectation.
Regards,
Luis
To make it simpler, for now forget the snappy compression. Just assume they
are binary Avro files...
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20008.html
Sent from the Apache Spark User List
We create spark context in an application running inside wildfly container.
When spark context is created, we see following entires in the wildfly log.
After the log4j-default.properties is loaded, every entry from spark is
printed out twice. And after running for a while, we start to see deadlock
Hi,
My question is:
I have multiple filter operations where I split my initial rdd into two
different groups. The two groups cover the whole initial set. In code, it's
something like:
set1 = initial.filter(lambda x: x == something)
set2 = initial.filter(lambda x: x != something)
By doing
Thanks Sean, it did turn out to be a simple mistake after all. I appreciate
your help.
Jatin
On Thu, Nov 27, 2014 at 7:52 PM, sowen [via Apache Spark User List]
ml-node+s1001560n19975...@n3.nabble.com wrote:
No, the feature vector is not converted. It contains count n_i of how
often each
Are you sure it's deadlock? print the thread dump (from kill -QUIT) of
the thread(s) that are deadlocked, I suppose, to show where the issue
is. It seems unlikely that a logging thread would be holding locks
that the app uses.
On Fri, Nov 28, 2014 at 4:01 PM, Charles charles...@cenx.com wrote:
Here you go.
Result resolver thread-3 - Thread t@35654
java.lang.Thread.State: BLOCKED
at java.io.PrintStream.flush(PrintStream.java:335)
- waiting to lock 104f7200 (a java.io.PrintStream) owned by
null_Worker-1 t@1022
at
[Ping]
Any hints?
On Thu, Nov 27, 2014 at 3:38 PM, Gerard Maas gerard.m...@gmail.com wrote:
Hi,
We are currently running our Spark + Spark Streaming jobs on Mesos,
submitting our jobs through Marathon.
We see with some regularity that the Spark Streaming driver gets killed by
Mesos and
This may help:
https://github.com/spark-jobserver/spark-jobserver
On Fri, Nov 28, 2014 at 6:59 AM, Jamal [via Apache Spark User List]
ml-node+s1001560n20007...@n3.nabble.com wrote:
Hi,
Any recommendation or tutorial on calling spark from java web application.
Current setup:
A spring java
You probably don't need to create a new kind of SchemaRDD. Instead I'd
suggest taking a look at the data sources API that we are adding in Spark
1.2. There is not a ton of documentation, but the test cases show how to
implement the various interfaces
Hi Luis,
There seems to be a delay in the 1.1.1 artifacts being pushed to our apache
mirrors. We are working with the infra people to get them up as soon as
possible. Unfortunately, due to the national holiday weekend in the US this
may take a little longer than expected, however. For now you may
you can try (scala version = you convert to python)
val set = initial.groupBy( x = if (x == something) key1 else key2)
This would do one pass over original data.
On Fri, Nov 28, 2014 at 8:21 AM, mrm ma...@skimlinks.com wrote:
Hi,
My question is:
I have multiple filter operations where I
19 matches
Mail list logo