Hi,
Thanks for your response. I modified my code as per your suggestion, but
now I am getting a runtime error. Here's my code:
val df_1 = df.filter( df(event) === 0)
. select(country, cnt)
val df_2 = df.filter( df(event) === 3)
. select(country, cnt)
In production, i'd suggest you having a High availability cluster with
minimum of 3 nodes (data nodes in your case).
Now lets examine your scenario:
- When you suddenly brings down one of the node which has 2 executors
running on it, what happens is that the node (DN2) will be having your jobs
HI ,
We have started RnD on Apache Spark to use its features such as Spark-SQL
Spark Streaming. I have two Pain points , can anyone of you address them which
are as follows:
1. Does spark allow us the feature to fetch updated items after an RDD
has been mapped and schema has been
It says:
ried to associate with unreachable remote address
[akka.tcp://sparkDriver@localhost:51849].
Address is now gated for 5000 ms, all messages to this address will be
delivered to dead letters. Reason: Connection refused: localhost/
127.0.0.1:51849
I'd suggest you changing this property:
As it says, you are having a jar conflict:
java.lang.NoSuchMethodError:
org.jboss.netty.channel.socket.nio.NioWorkerPool.init(Ljava/util/concurrent/Executor;I)V
Verify your classpath and see the netty versions
Thanks
Best Regards
On Tue, Mar 24, 2015 at 11:07 PM, nitinkak001
Hi Sparkers,
I am trying to load data in spark with the following command
*sqlContext.sql(LOAD DATA LOCAL INPATH '/home/spark12/sandeep/sandeep.txt
' INTO TABLE src);*
*Getting exception below*
*Server IPC version 9 cannot communicate with client version 4*
NOte : i am using Hadoop 2.2
The spark-csv package can handle header row, and the code is at the link below.
It could also use the header to infer field names in the schema.
https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/CsvRelation.scala
--- Original Message ---
From: Dean
It means you are already having 4 applications running on 4040, 4041, 4042,
4043. And that's why it was able to run on 4044.
You can do a *netstat -pnat | grep 404* *And see what all processes are
running.
Thanks
Best Regards
On Wed, Mar 25, 2015 at 1:13 AM, , Roy rp...@njit.edu wrote:
I get
Hello!
Thank for your responses. I was afraid that due to partitioning I will
loose the logic that the first element is the header. I observe that
rdd.first calls behind the rdd.take(1) method.
Best regards,
Florin
On Tue, Mar 24, 2015 at 4:41 PM, Dean Wampler deanwamp...@gmail.com wrote:
Hi,
Can someone explain how spark streaming cep engine works ?
How to use it with sample example?
http://spark-packages.org/package/Stratio/streaming-cep-engine
--
View this message in context:
Looks like you have to build Spark with related Hadoop version, otherwise
you will meet exception as mentioned. you could follow this doc:
http://spark.apache.org/docs/latest/building-spark.html
2015-03-25 15:22 GMT+08:00 sandeep vura sandeepv...@gmail.com:
Hi Sparkers,
I am trying to load
I use command to run Unit test, as follow:
./make-distribution.sh --tgz --skip-java-test -Pscala-2.10 -Phadoop-2.3
-Phive -Phive-thriftserver -Pyarn -Dyarn.version=2.3.0-cdh5.1.2
-Dhadoop.version=2.3.0-cdh5.1.2
mvn -Pscala-2.10 -Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn
I compile spark-1.3.0 on Hadoop 2.3.0-cdh5.1.0 with protoc 2.5.0. But when I
try to run the examples, it throws: Exception in thread main
java.lang.VerifyError: class
org.apache.hadoop.yarn.proto.YarnProtos$PriorityProto overrides final method
Is your spark compiled against hadoop 2.2? If not download
https://spark.apache.org/downloads.html the Spark 1.2 binary with Hadoop
2.2
Thanks
Best Regards
On Wed, Mar 25, 2015 at 12:52 PM, sandeep vura sandeepv...@gmail.com
wrote:
Hi Sparkers,
I am trying to load data in spark with the
Hi, experts
I wrote a very simple spark program to test the KryoSerialization function. The
codes are as following:
object TestKryoSerialization {
def main(args: Array[String]) {
val conf = new SparkConf()
conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
Hi,
I am using *Spark SQL* to query on my *Hive cluster*, following Spark SQL
and DataFrame Guide
https://spark.apache.org/docs/latest/sql-programming-guide.html step by
step. However, my HiveQL via sqlContext.sql() fails and
java.lang.OutOfMemoryError was raised. The expected result of such
Hi,
I am using *Spark SQL* to query on my *Hive cluster*, following Spark SQL
and DataFrame Guide
https://spark.apache.org/docs/latest/sql-programming-guide.html step by
step. However, my HiveQL via sqlContext.sql() fails and
java.lang.OutOfMemoryError was raised. The expected result of such
Where do i export MAVEN_OPTS in spark-env.sh or hadoop-env.sh
I am running the below command in spark/yarn directory where pom.xml file
is available
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean package
Please correct me if i am wrong.
On Wed, Mar 25, 2015 at 12:55 PM,
Can you try giving Spark driver more heap ?
Cheers
On Mar 25, 2015, at 2:14 AM, Todd Leo sliznmail...@gmail.com wrote:
Hi,
I am using Spark SQL to query on my Hive cluster, following Spark SQL and
DataFrame Guide step by step. However, my HiveQL via sqlContext.sql() fails
and
Hi ,
when I am submitting spark job in cluster mode getting error as under in
hadoop-yarn log,
someone has any idea,please suggest,
2015-03-25 23:35:22,467 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1427124496008_0028 State change from FINAL_SAVING to FAILED
Just run :
mvn -Pyarn -Phadoop-2.4 -D*hadoop.version=2.2* -DskipTests clean package
Thanks
Best Regards
On Wed, Mar 25, 2015 at 3:08 PM, sandeep vura sandeepv...@gmail.com wrote:
Where do i export MAVEN_OPTS in spark-env.sh or hadoop-env.sh
I am running the below command in spark/yarn
It is ok when I do query data from a small hdfs file.
But if the hdfs file is 152m,I got this exception.
I try this code
.'sc.setSystemProperty(spark.kryoserializer.buffer.mb,'256')'.error
still.
```
com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0,
required: 39135
at
It seems that elasticsearch-spark_2.10 currently not supporting spart 1.3.
Could you tell me if there is an alternative way to save Dataframes to
elasticsearch?
--
View this message in context:
Oh, in that case you should mention 2.4, If you don't want to compile
spark, then you can download the precompiled version from Downloads page
https://spark.apache.org/downloads.html.
http://d3kbcqa49mib13.cloudfront.net/spark-1.2.0-bin-hadoop2.4.tgz
Thanks
Best Regards
On Wed, Mar 25, 2015 at
Of course, VERSION is supposed to be replaced by a real Hadoop version!
On Wed, Mar 25, 2015 at 12:04 PM, sandeep vura sandeepv...@gmail.com wrote:
Build failed with following errors.
I have executed the below following command.
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests
is there a way to have this dynamically pick the local IP.
static assignment does not work cos the workers are dynamically allocated
on mesos
On Wed, Mar 25, 2015 at 3:04 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
It says:
ried to associate with unreachable remote address
Remove SPARK_LOCAL_IP then?
Thanks
Best Regards
On Wed, Mar 25, 2015 at 6:45 PM, Anirudha Jadhav aniru...@nyu.edu wrote:
is there a way to have this dynamically pick the local IP.
static assignment does not work cos the workers are dynamically allocated
on mesos
On Wed, Mar 25, 2015 at
Hi,
I am trying to run a Spark on YARN program provided by Spark in the examples
directory using Amazon Kinesis on EMR cluster :
I am using Spark 1.3.0 and EMR AMI: 3.5.0
I've setup the Credentials
export AWS_ACCESS_KEY_ID=XX
export AWS_SECRET_KEY=XXX
*A) This is the Kinesis Word
You can open the Master UI running on 8080 port of your ubuntu machine and
after submitting the job, you can see how many cores are being used etc
from the UI.
Thanks
Best Regards
On Wed, Mar 25, 2015 at 6:50 PM, James King jakwebin...@gmail.com wrote:
Thanks Akhil,
Yes indeed this is why it
Build failed with following errors.
I have executed the below following command.
* mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean
package*
[INFO]
[INFO] BUILD FAILURE
[INFO]
Did you built for kineses using profile *-Pkinesis-asl*
On Wed, Mar 25, 2015 at 7:18 PM, ankur.jain ankur.j...@yash.com wrote:
Hi,
I am trying to run a Spark on YARN program provided by Spark in the
examples
directory using Amazon Kinesis on EMR cluster :
I am using Spark 1.3.0 and EMR AMI:
Yes I do have other application already running.
Thanks for your explanation.
On Wed, Mar 25, 2015 at 2:49 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
It means you are already having 4 applications running on 4040, 4041,
4042, 4043. And that's why it was able to run on 4044.
You can
-D*hadoop.version=2.2*
Thanks
Best Regards
On Wed, Mar 25, 2015 at 5:34 PM, sandeep vura sandeepv...@gmail.com wrote:
Build failed with following errors.
I have executed the below following command.
* mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean
package*
[INFO]
Thanks Patrick and Michael for your responses.
For anyone else that runs across this problem prior to 1.3.1 being released,
I've been pointed to this Jira ticket that's scheduled for 1.3.1:
https://issues.apache.org/jira/browse/SPARK-6351
Thanks again.
--
View this message in context:
You can use AWS user-data feature.
try this, if it help for you.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
-
Software Developer
SigmoidAnalytics, Bangalore
--
View this message in context:
I'm trying to run the Java NetwrokWordCount example against a simple spark
standalone runtime of one master and one worker.
But it doesn't seem to work, the text entered on the Netcat data server is
not being picked up and printed to Eclispe console output.
However if I use
Hi Martin, here’s 2 possibilities to overcome this:
1) Put your logic into org.apache.spark package in your project - then
everything would be accessible.
2) Dirty trick:
|object SparkVector extends HashingTF { val VectorUDT: DataType =
outputDataType } |
then you can do like this:
Hi Sachin,
It appears that the application master is failing. To figure out what's
wrong you need to get the logs for the application master.
-Sandy
On Wed, Mar 25, 2015 at 7:05 AM, Sachin Singh sachin.sha...@gmail.com
wrote:
OS I am using Linux,
when I will run simply as master yarn, its
Oh, just noticed that you were calling |sc.setSystemProperty|. Actually
you need to set this property in SparkConf or in spark-defaults.conf.
And there are two configurations related to Kryo buffer size,
* spark.kryoserializer.buffer.mb, which is the initial size, and
*
I have a simple and probably dumb question about foreachRDD.
We are using spark streaming + cassandra to compute concurrent users every
5min. Our batch size is 10secs and our block interval is 2.5secs.
At the end of the world we are using foreachRDD to join the data in the RDD
with existing data
Spark Streaming requires you to have minimum of 2 cores, 1 for receiving
your data and the other for processing. So when you say local[2] it
basically initialize 2 threads on your local machine, 1 for receiving data
from network and the other for your word count processing.
Thanks
Best Regards
*I am using hadoop 2.4 should i mention -Dhadoop.version=2.2*
*$ hadoop version*
*Hadoop 2.4.1*
*Subversion http://svn.apache.org/repos/asf/hadoop/common
http://svn.apache.org/repos/asf/hadoop/common -r 1604318*
*Compiled by jenkins on 2014-06-21T05:43Z*
*Compiled with protoc 2.5.0*
*From source
Thanks Akhil,
Yes indeed this is why it works when using local[2] but I'm unclear of why
it doesn't work when using standalone daemons?
Is there way to check what cores are being seen when running against
standalone daemons?
I'm running the master and worker on same ubuntu host. The Driver
You're welcome. How did it go?
*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor
On Wed, Mar 25, 2015 at 7:53 AM, Ashish Mukherjee
ashish.mukher...@gmail.com
Hi
I've succeed to write kafka stream to parquet file in Spark 1.2 but I can't
make it with spark 1.3
As in streaming I can't use saveAsParquetFile() because I can't add data to
an existing parquet File
I know that it's possible to stream data directly into parquet
could you help me by
Spark 1.3 is not supported by elasticsearch-hadoop yet but will be very soon:
https://github.com/elastic/elasticsearch-hadoop/issues/400
However in the meantime you could use df.toRDD.saveToEs - though you may have
to manipulate the Row object perhaps to extract fields, not sure if it will
I've been given a feature requirement that means processing events on a
latency lower than 0.25ms.
Meaning I would have to make sure that Spark streaming gets new events from
the messaging layer within that period of time. Would anyone have achieve
such numbers using a Spark cluster? Or would
I have a EC2 cluster created using spark version 1.2.1.
And I have a SBT project .
Now I want to upgrade to spark 1.3 and use the new features.
Below are issues .
Sorry for the long post.
Appreciate your help.
Thanks
-Roni
Question - Do I have to create a new cluster using spark 1.3?
Here is
I don't think it's feasible to set a batch interval of 0.25ms. Even at
tens of ms the overhead of the framework is a large factor. Do you
mean 0.25s = 250ms?
Related thoughts, and I don't know if they apply to your case:
If you mean, can you just read off the source that quickly? yes.
Sometimes
I have a SparkSQL dataframe with a a few billion rows that I need to
quickly filter down to a few hundred thousand rows, using an operation like
(syntax may not be correct)
df = df[ df.filter(lambda x: x.key_col in approved_keys)]
I was thinking about serializing the data using parquet and
I want to use the restore from checkpoint to continue from last accumulated
word counts and process new streams of data. This recovery process will keep
accurate state of accumulated counters state (calculated by
updateStateByKey) after failure/recovery or temp shutdown/upgrade to new
code.
Hi Guys, I running the following function with spark-submmit and de SO is
killing my process :
def getRdd(self,date,provider):
path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz'
log2= self.sqlContext.jsonFile(path)
log2.registerTempTable('log_test')
log2.cache()
I am facing same issue, posted a new thread. Please respond.
On Wed, Jan 14, 2015 at 4:38 AM, Zhan Zhang zzh...@hortonworks.com wrote:
Hi Folks,
I am trying to run hive context in yarn-cluster mode, but met some error.
Does anybody know what cause the issue.
I use following cmd to build
I solve this by increase the PermGen memory size in driver.
-XX:MaxPermSize=512m
Thanks.
Zhan Zhang
On Mar 25, 2015, at 10:54 AM, ÐΞ€ρ@Ҝ (๏̯͡๏)
deepuj...@gmail.commailto:deepuj...@gmail.com wrote:
I am facing same issue, posted a new thread. Please respond.
On Wed, Jan 14, 2015 at 4:38 AM,
For the Spark SQL parts, 1.3 breaks backwards compatibility, because before
1.3, Spark SQL was considered experimental where API changes were allowed.
So, H2O and ADA compatible with 1.2.X might not work with 1.3.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
I modified the Hive query but run into same error. (
http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
I am facing same issue, posted a new thread. Please respond.
On Wed, Jul 9, 2014 at 1:56 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com
wrote:
Hi,
My code was running properly but then it suddenly gave this error. Can you
just put some light on it.
###
0 KB, free: 38.7
What version of Spark do the other dependencies rely on (Adam and H2O?) - that
could be it
Or try sbt clean compile
—
Sent from Mailbox
On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:
I have a EC2 cluster created using spark version 1.2.1.
And I have a SBT project .
Even if H2o and ADA are dependent on 1.2.1 , it should be backword
compatible, right?
So using 1.3 should not break them.
And the code is not using the classes from those libs.
I tried sbt clean compile .. same errror
Thanks
_R
On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath
Thanks. This library is only available with Spark 1.3. I am using version
1.2.1. Before I upgrade to 1.3, I want to try what can be done in 1.2.1.
So I am using following:
val MyDataset = sqlContext.sql(my select query”)
MyDataset.map(t =
As you noted, you can change the spark.driver.maxResultSize value in your
Spark Configurations (https://spark.apache.org/docs/1.2.0/configuration.html).
Please reference the Spark Properties section noting that you can modify
these properties via the spark-defaults.conf or via SparkConf().
HTH!
I have a YARN cluster where the max memory allowed is 16GB. I set 12G for
my driver, however i see OutOFMemory error even for this program
http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables .
What do you suggest ?
On Wed, Mar 25, 2015 at 8:23 AM, Thomas Gerber
Unfortunately you are now hitting a bug (that is fixed in master and will
be released in 1.3.1 hopefully next week). However, even with that your
query is still ambiguous and you will need to use aliases:
val df_1 = df.filter( df(event) === 0)
. select(country, cnt).as(a)
val
Ah I see now you are trying to use a spark 1.2 cluster - you will need to be
running spark 1.3 on your EC2 cluster in order to run programs built against
spark 1.3.
You will need to terminate and restart your cluster with spark 1.3
—
Sent from Mailbox
On Wed, Mar 25, 2015 at 6:39 PM,
Hi,
I have a RDD of pairs of strings like below :
(A,B)
(B,C)
(C,D)
(A,D)
(E,F)
(B,F)
I need to transform/filter this into a RDD of pairs that does not repeat a
string once it has been used once. So something like ,
(A,B)
(C,D)
(E,F)
(B,C) is out because B has already ben used in (A,B), (A,D)
You should also try increasing the perm gen size: -XX:MaxPermSize=512m
On Wed, Mar 25, 2015 at 2:37 AM, Ted Yu yuzhih...@gmail.com wrote:
Can you try giving Spark driver more heap ?
Cheers
On Mar 25, 2015, at 2:14 AM, Todd Leo sliznmail...@gmail.com wrote:
Hi,
I am using *Spark SQL*
What would it do with the following dataset?
(A, B)
(A, C)
(B, D)
On Wed, Mar 25, 2015 at 1:02 PM, Himanish Kushary himan...@gmail.com
wrote:
Hi,
I have a RDD of pairs of strings like below :
(A,B)
(B,C)
(C,D)
(A,D)
(E,F)
(B,F)
I need to transform/filter this into a RDD of pairs
The only way to do in using python currently is to use the string based
filter API (where you pass us an expression as a string, and we parse it
using our SQL parser).
from pyspark.sql import Row
from pyspark.sql.functions import *
df = sc.parallelize([Row(name=test)]).toDF()
df.filter(name in
Thanks Dean and Nick.
So, I removed the ADAM and H2o from my SBT as I was not using them.
I got the code to compile - only for fail while running with -
SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21
Exception in thread main java.lang.NoClassDefFoundError:
It will only give (A,B). I am generating the pair from combinations of the
the strings A,B,C and D, so the pairs (ignoring order) would be
(A,B),(A,C),(A,D),(B,C),(B,D),(C,D)
On successful filtering using the original condition it will transform to
(A,B) and (C,D)
On Wed, Mar 25, 2015 at 3:00
Weird. Are you running using SBT console? It should have the spark-core jar
on the classpath. Similarly, spark-shell or spark-submit should work, but
be sure you're using the same version of Spark when running as when
compiling. Also, you might need to add spark-sql to your SBT dependencies,
but
Hello Jeremy,
Sorry for the delayed reply!
First issue was resolved, I believe it was just production and consumption
rate problem.
Regarding the second question, I am streaming the data from the file and
there are about 38k records. I am sending the streams in the same sequence
as I am reading
Until then you can try
sql(SET spark.sql.parquet.useDataSourceApi=false)
On Wed, Mar 25, 2015 at 12:15 PM, Michael Armbrust mich...@databricks.com
wrote:
This will be fixed in Spark 1.3.1:
https://issues.apache.org/jira/browse/SPARK-6351
and is fixed in master/branch-1.3 if you want to
The probably means there are not enough free resources in your cluster
to run the AM for the Spark job. Check your RM's web ui to see the
resources you have available.
On Wed, Mar 25, 2015 at 12:08 PM, Khandeshi, Ami
ami.khande...@fmr.com.invalid wrote:
I am seeing the same behavior. I have
My example is a totally reasonable way to do it, it just requires
constructing strings
In many cases you can also do it with column objects
df[df.name == test].collect()
Out[15]: [Row(name=u'test')]
You should check out:
Hi,
Thanks for your response. I am not clear about why the query is ambiguous.
val both = df_2.join(df_1, df_2(country)===df_1(country), left_outer)
I thought df_2(country)===df_1(country) indicates that the country
field in the 2 dataframes should match and df_2(country) is the
equivalent of
Hi Davies, I running 1.1.0.
Now I'm following this thread that recommend use batchsize parameter = 1
http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html
if this does not work I will install 1.2.1 or 1.3
Regards
On Wed, Mar 25, 2015 at 3:39 PM, Davies
I am seeing the same behavior. I have enough resources. How do I resolve
it?
Thanks,
Ami
This will be fixed in Spark 1.3.1:
https://issues.apache.org/jira/browse/SPARK-6351
and is fixed in master/branch-1.3 if you want to compile from source
On Wed, Mar 25, 2015 at 11:59 AM, Stuart Layton stuart.lay...@gmail.com
wrote:
I'm trying to save a dataframe to s3 as a parquet file but I'm
Hi
Is there a way to write all RDDs in a DStream to the same file?
I tried this and got an empty file. I think it's bc the file is not closed i.e.
ESMinibatchFunctions.writer.close() executes before the stream is created.
Here's my code
myStream.foreachRDD(rdd = {
rdd.foreach(x = {
What's the version of Spark you are running?
There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3,
[1] https://issues.apache.org/jira/browse/SPARK-6055
On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
Hi Guys, I running the following
Thanks for the response, I was using IN as an example of the type of
operation I need to do. Is there another way to do this that lines up more
naturally with the way things are supposed to be done in SparkSQL?
On Wed, Mar 25, 2015 at 2:29 PM, Michael Armbrust mich...@databricks.com
wrote:
The
This is a different kind of error. Thomas' OOM error was specific to the
kernel refusing to create another thread/process for his application.
Matthew
On Wed, Mar 25, 2015 at 10:51 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
I have a YARN cluster where the max memory allowed is 16GB. I set
I'm trying to save a dataframe to s3 as a parquet file but I'm getting
Wrong FS errors
df.saveAsParquetFile(parquetFile)
15/03/25 18:56:10 INFO storage.MemoryStore: ensureFreeSpace(46645) called
with curMem=82744, maxMem=278302556
15/03/25 18:56:10 INFO storage.MemoryStore: Block broadcast_5
With batchSize = 1, I think it will become even worse.
I'd suggest to go with 1.3, have a taste for the new DataFrame API.
On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
Hi Davies, I running 1.1.0.
Now I'm following this thread that recommend use
Thats a good question. In this particular example, it is really only
internal implementation details that make it ambiguous. However, fixing
this was a very large change so we have defered it to Spark 1.4 and instead
print a warning now when you construct trivially equal expressions. I can
try
I think the problem is the use the loopback address:
export SPARK_LOCAL_IP=127.0.0.1
In the stack trace from the slave, you see this:
... Reason: Connection refused: localhost/127.0.0.1:51849
akka.actor.ActorNotFound: Actor not found for:
Recall that the input isn't actually read until to do something that forces
evaluation, like call saveAsTextFile. You didn't show the whole stack trace
here, but it probably occurred while parsing an input line where one of
your long fields is actually an empty string.
Because this is such a
Yes, that's the problem. The RDD class exists in both binary jar files, but
the signatures probably don't match. The bottom line, as always for tools
like this, is that you can't mix versions.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
Hi,I am running spark in mesos and getting this error. Can anyone help me
resolve this?Thanks15/03/25 21:05:00 ERROR scheduler.LiveListenerBus:
Listener EventLoggingListener threw an
exceptionjava.lang.reflect.InvocationTargetExceptionat
sun.reflect.GeneratedMethodAccessor12.invoke(Unknown
if i run the following:
db = sqlContext.load(jdbc, url=jdbc:postgresql://localhost/xx,
dbtables=mstr.d_customer)
i get the error:
py4j.protocol.Py4JJavaError: An error occurred while calling o28.load.
: java.io.FileNotFoundException: File file:/Users/elliottcordo/jdbc does
not exist
Seems to
My cluster is still on spark 1.2 and in SBT I am using 1.3.
So probably it is compiling with 1.3 but running with 1.2 ?
On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com
wrote:
Weird. Are you running using SBT console? It should have the spark-core
jar on the classpath.
Hi Arunkumar,
I think L-BFGS will not work since L-BFGS algorithm assumes that the
objective function will be always the same (i.e., the data is the
same) for entire optimization process to construct the approximated
Hessian matrix. In the streaming case, the data will be changing, so
it will
Try:
db = sqlContext.load(source=jdbc, url=jdbc:postgresql://localhost/xx,
dbtables=mstr.d_customer)
On Wed, Mar 25, 2015 at 2:19 PM, elliott cordo elliottco...@gmail.com
wrote:
if i run the following:
db = sqlContext.load(jdbc, url=jdbc:postgresql://localhost/xx,
dbtables=mstr.d_customer)
There is no configuration for it now.
Best Regards,
Shixiong Zhu
2015-03-26 7:13 GMT+08:00 Manoj Samel manojsamelt...@gmail.com:
There may be firewall rules limiting the ports between host running spark
and the hadoop cluster. In that case, not all ports are allowed.
Can it be a range of
Thanks for following up. I'll fix the docs.
On Wed, Mar 25, 2015 at 4:04 PM, elliott cordo elliottco...@gmail.com
wrote:
Thanks!.. the below worked:
db = sqlCtx.load(source=jdbc,
url=jdbc:postgresql://localhost/x?user=xpassword=x,dbtable=mstr.d_customer)
Note that
Yeah sorry, this is already fixed but we need to republish the docs. I'll
add both of the following do work:
people.filter(age 30)
people.filter(people(age) 30)
On Tue, Mar 24, 2015 at 7:11 PM, SK skrishna...@gmail.com wrote:
The following statement appears in the Scala API example at
Is there any way that I can install the new one and remove previous version.
I installed spark 1.3 on my EC2 master and set teh spark home to the new
one.
But when I start teh spark-shell I get -
java.lang.UnsatisfiedLinkError:
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
Thanks Peter,
I ended up doing something similar. I however consider both the approaches
you mentioned bad practices which is why I was looking for a solution
directly supported by the current code.
I can work with that now, but it does not seem to be the proper solution.
Regards,
Hi everyone,
I am considering moving from Spark-Standalone to YARN. The context is that
there are multiple Spark applications that are using different versions of
Spark that all want to use the same YARN cluster.
My question is: if I use a single Spark YARN shuffle service jar on the Node
1 - 100 of 138 matches
Mail list logo