Thanks TD.
On Wed, Dec 31, 2014 at 7:19 AM, Tathagata Das tathagata.das1...@gmail.com
wrote:
1. Of course, a single block / partition has many Kafka messages, and
from different Kafka topics interleaved together. The message count is
not related to the block count. Any message received within
hey guys
My dataset is like this
025126,Chills,8.10,Injection site oedema,8.10,Injection site
reaction,8.10,Malaise,8.10,Myalgia,8.10
Intended output is ==025126,Chills
025126,Injection site oedema
025126,Injection site reaction
025126,Malaise
025126,Myalgia
My code is as
Error from python worker:
python: module pyspark.daemon not found
PYTHONPATH was:
/home/npokala/data/spark-install/spark-master/python:
Please can somebody help me on this, how to resolve the issue.
-Naveen
Hi, Xiaoyu!
You can use `spark.sql.thriftserver.scheduler.pool` instead of
`spark.scheduler.pool` only in spark thrift server.
On Wed, Dec 31, 2014 at 3:55 PM, Xiaoyu Wang wangxy...@gmail.com wrote:
Hi all!
I use Spark SQL1.2 start the thrift server on yarn.
I want to use fair scheduler
Hi .
I want to use spark streaming to read data from cassandra.
But in my case I need process data based on event. (not retrieving the data
constantly from Cassandra).
Question:
what is the way to issue the processing using spark streaming from time
to time.
Thanks
Oleg.
Why don't you push \n instead of \t in your first transformation [
(fields(0),(fields(1)+\t+fields(3)+\t+fields(5)+\t+fields(7)+\t
+fields(9)))] and then do saveAsTextFile?
-Raghavendra
On Wed Dec 31 2014 at 1:42:55 PM Sanjay Subramanian
sanjaysubraman...@yahoo.com.invalid wrote:
hey guys
My
I am accessing hdfs with spark .textFile method. and I receive error as
Exception in thread main org.apache.hadoop.ipc.RemoteException: Server IPC
version 9 cannot communicate with client version 4
here are my dependencies
This generally means you have packaged Hadoop 1.x classes into your
app accidentally. The most common cause is not marking Hadoop and
Spark classes as provided dependencies. Your app doesn't need to
ship its own copy of these classes when you use spark-submit.
On Wed, Dec 31, 2014 at 10:47 AM,
Hi,
I am receiving the following error when I am trying to connect spark cluster(
which is on unix) from my windows machine using pyspark interactive shell
pyspark -master (spark cluster url)
Then I executed the following commands.
lines = sc.textFile(hdfs://master/data/spark/SINGLE.TXT)
Hi all,
I am currently trying to combine datastax's spark-cassandra-connector and
typesafe's akka-http-experimental
on Spark 1.1.1 (spark-cassandra-connector for Spark 1.2.0 not out yet) and
scala 2.10.4
I am using the hadoop 2.4 pre built package. (build.sbt file at the end)
To solve the
Hi J_soft,
for me it is working, I didn't put -Dscala-2.10 -X parameters. I got only
one warning, since I don't have hadoop 2.5 it didn't activate this profile:
/larix@kovral:~/sources/spark-1.2.0 mvn -Pyarn -Phadoop-2.5
-Dhadoop.version=2.5.0 -DskipTests clean package
Found 0 infos
Hi Sanjay,
I tried running your code on spark shell piece by piece –
// Setup
val line1 = “025126,Chills,8.10,Injection site oedema,8.10,Injection site
reaction,8.10,Malaise,8.10,Myalgia,8.10”
val line2 = “025127,Chills,8.10,Injection site oedema,8.10,Injection site
Hi Sanjay,
Doing an if inside a Map sounds like a bad idea, it seems like you actually
want to filter and then apply map
On Wed, Dec 31, 2014 at 9:54 AM, Kapil Malik kma...@adobe.com wrote:
Hi Sanjay,
I tried running your code on spark shell piece by piece –
// Setup
val line1 =
Hi All,
I am trying to run a sample Spark program using Scala SBT,
Below is the program,
def main(args: Array[String]) {
val logFile = E:/ApacheSpark/usb/usb/spark/bin/README.md // Should
be some file on your system
val sc = new SparkContext(local, Simple App,
If you look at your program output closely, you can see the following output.
Lines with a: 24, Lines with b: 15
The exception seems to be happening with Spark cleanup after executing your
code. Try adding sc.stop() at the end of your program to see if the exception
goes away.
On
Hey Kapil, Fernando
Thanks for your mail.
[1] Fernando, if I don't use an if logic inside the map then if I have
lines of input data that have less fields than I am expecting I get
ArrayOutOfBounds exception. so the if is to safeguard against that.
[2] Kapil, I am sorry I did not clarify. Yes
Yes. The exception is gone now after adding stop() at the end.
Can you please tell me what this stop() does at the end. Does it disable
the spark context.
On Wed, Dec 31, 2014 at 10:09 AM, RK prk...@yahoo.com wrote:
If you look at your program output closely, you can see the following
output.
The previously submitted code doesn’t actually show the problem I was trying to
show effectively since the issue becomes clear between subsequent steps. Within
a single step it appears things are cleared up properly. Memory usage becomes
evident pretty quickly.
def showMemoryUsage(sc:
Hi Sanjay,
Oh yes .. on flatMapValues, it's defined in PairRDDFunctions, and you need to
import org.apache.spark.rdd.SparkContext._ to use them
(http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
)
@Sean, yes indeed flatMap / flatMapValues both can
Hi Naveen,
Quoting
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext
SparkContext is Main entry point for Spark functionality. A SparkContext
represents the connection to a Spark cluster, and can be used to create RDDs,
accumulators and broadcast variables
-dev, +user
A decent guess: Does your 'save' function entail collecting data back
to the driver? and are you running this from a machine that's not in
your Spark cluster? Then in client mode you're shipping data back to a
less-nearby machine, compared to with cluster mode. That could explain
the
Also the job was deployed from the master machine in the cluster.
ᐧ
On Wed, Dec 31, 2014 at 6:35 PM, Enno Shioji eshi...@gmail.com wrote:
Oh sorry that was a edit mistake. The code is essentially:
val msgStream = kafkaStream
.map { case (k, v) = v}
Oh sorry that was a edit mistake. The code is essentially:
val msgStream = kafkaStream
.map { case (k, v) = v}
.map(DatatypeConverter.printBase64Binary)
.saveAsTextFile(s3n://some.bucket/path, classOf[LzoCodec])
I.e. there is essentially no original code (I was calling
Hi,
Where does the following path that appears in the logs below come from?
/opt/xdsp/spark-1.2.0/H:\Soft\Maven\repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar
Did you somehow point at the local maven repository that's H:\Soft\Maven?
Jacek
31 gru 2014 01:48 j_soft
We actually do publish our own version of this jar, because the version
that the hive team publishes is an uber jar and this breaks all kinds of
things. As a result I'd file the JIRA against Spark.
On Wed, Dec 31, 2014 at 12:55 PM, Ted Yu yuzhih...@gmail.com wrote:
Michael:
I see.
I logged SPARK-5041 which references this thread.
Thanks
On Wed, Dec 31, 2014 at 12:57 PM, Michael Armbrust mich...@databricks.com
wrote:
We actually do publish our own version of this jar, because the version
that the hive team publishes is an uber jar and this breaks all kinds of
bumping this thread up
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/UpdateStateByKey-persist-to-Tachyon-tp20798p20930.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Whats your spark-submit commands in both cases? Is it Spark Standalone or
YARN (both support client and cluster)? Accordingly what is the number of
executors/cores requested?
TD
On Wed, Dec 31, 2014 at 10:36 AM, Enno Shioji eshi...@gmail.com wrote:
Also the job was deployed from the master
Is there a limit function which just returns the first N records?
Sample is nice but I’m trying to do this so it’s super fast and just to
test the functionality of an algorithm.
With sample I’d have to compute the % that would yield 1000 results first…
Kevin
--
Founder/CEO Spinn3r.com
There's a take method that might do what you need:
*def take(**num**: **Int**): Array[T]*
Take the first num elements of the RDD.
On Jan 1, 2015 12:02 AM, Kevin Burton bur...@spinn3r.com wrote:
Is there a limit function which just returns the first N records?
Sample is nice but I’m trying to
Hi,
I get this following Exception when I submit spark application that
calculates the frequency of characters in a file. Especially, when I
increase the size of data, I face this problem.
Exception in thread Thread-47 org.apache.spark.SparkException: Job
aborted due to stage failure: Task
Which version of Spark are you using?
On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi,
I get this following Exception when I submit spark application that
calculates the frequency of characters in a file. Especially, when I
increase the size of data, I
This is really weird and I’m surprised no one has found this issue yet.
I’ve spent about an hour or more trying to debug this :-(
My spark install is ignoring ALL my memory settings. And of course my job
is running out of memory.
The default is 512MB so pretty darn small.
The worker and
spark-1.0.0
On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com wrote:
Which version of Spark are you using?
On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek
kartheek.m...@gmail.com wrote:
Hi,
I get this following Exception when I submit spark application that
calculates
-- Forwarded message --
From: rapelly kartheek kartheek.m...@gmail.com
Date: Thu, Jan 1, 2015 at 12:05 PM
Subject: Re: NullPointerException
To: Josh Rosen rosenvi...@gmail.com, user@spark.apache.org
spark-1.0.0
On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com
wow. Just figured it out:
conf.set( spark.executor.memory, 2g);
I have to set it in the Job… that’s really counter intuitive. Especially
because the documentation in spark-env.sh says the exact opposite.
What’s the resolution here. This seems like a mess. I’d propose a solution
to
It looks like 'null' might be selected as a block replication peer?
https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L786
I know that we fixed some replication bugs in newer versions of Spark (such
as
Ok. Let me try out on a newer version.
Thank you!!
On Thu, Jan 1, 2015 at 12:17 PM, Josh Rosen rosenvi...@gmail.com wrote:
It looks like 'null' might be selected as a block replication peer?
Welcome to Spark. What's more fun is that setting controls memory on the
executors but if you want to set memory limit on the driver you need to
configure it as a parameter of the spark-submit script. You also set
num-executors and executor-cores on the spark submit call.
See both the Spark
Hi Tathagata,
It's a standalone cluster. The submit commands are:
== CLIENT
spark-submit --class com.fake.Test \
--deploy-mode client --master spark://fake.com:7077 \
fake.jar arguments
== CLUSTER
spark-submit --class com.fake.Test \
--deploy-mode cluster --master spark://fake.com:7077 \
40 matches
Mail list logo