I would like to define the names of my output in Spark, I have a process
which write many fails and I would like to name them, is it possible? I
guess that it's not possible with saveAsText method.
It would be something similar to the MultipleOutput of Hadoop.
Hi,
I'm starting with Spark and I just trying to understand if I want to
use Spark Streaming, should I use to feed it Flume or Kafka? I think
there's not a official Sink for Flume to Spark Streaming and it seems
that Kafka it fits better since gives you readibility.
Could someone give a good
or something else) and make it available for a variety
of apps via Kafka.
Hope this helps!
Hari
On Wed, Nov 19, 2014 at 8:10 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
Hi,
I'm starting with Spark and I just trying to understand if I want to
use Spark Streaming, should I use to feed
Streaming
(from Flume or Kafka or something else) and make it available for a variety
of apps via Kafka.
Hope this helps!
Hari
On Wed, Nov 19, 2014 at 8:10 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
Hi,
I'm starting with Spark and I just trying to understand if I want to
use Spark
, Guillermo Ortiz konstt2...@gmail.com
wrote:
Thank you for your answer, I don't know if I typed the question
correctly. But your nswer helps me.
I'm going to make the question again for knowing if you understood me.
I have this topology:
DataSource1, , DataSourceN -- Kafka -- SparkStreaming
Hello,
I'm a newbie with Spark but I've been working with Hadoop for a while.
I have two questions.
Is there any case where MR is better than Spark? I don't know what
cases I should be used Spark by MR. When is MR faster than Spark?
The other question is, I know Java, is it worth it to learn
Hi,
I'm a newbie with Spark,, I'm just trying to use SparkStreaming and
filter some data sent with a Java Socket but it's not working... it
works when I use ncat
Why is it not working??
My sparkcode is just this:
val sparkConf = new SparkConf().setMaster(local[2]).setAppName(Test)
val
which will be sent to the client whoever connects
on 12345, i have it tested and is working with SparkStreaming
(socketTextStream).
Thanks
Best Regards
On Fri, Dec 12, 2014 at 6:25 PM, Guillermo Ortiz konstt2...@gmail.com
wrote:
Hi,
I'm a newbie with Spark,, I'm just trying to use
ak...@sigmoidanalytics.com wrote:
socketTextStream is Socket client which will read from a TCP ServerSocket.
Thanks
Best Regards
On Fri, Dec 12, 2014 at 7:21 PM, Guillermo Ortiz konstt2...@gmail.com
wrote:
I dont' understand what spark streaming socketTextStream is waiting...
is it like
Why doesn't it work?? I guess that it's the same with \n.
2014-12-13 12:56 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com:
I got it, thanks,, a silly question,, why if I do:
out.write(hello + System.currentTimeMillis() + \n); it doesn't
detect anything and if I do
out.println(hello
Thanks.
2014-12-14 12:20 GMT+01:00 Gerard Maas gerard.m...@gmail.com:
Are you using a bufferedPrintWriter? that's probably a different flushing
behaviour. Try doing out.flush() after out.write(...) and you will have the
same result.
This is Spark unrelated btw.
-kr, Gerard.
I'm a newbie with Spark,,, a simple question
val errorLines = lines.filter(_.contains(h))
val mapErrorLines = errorLines.map(line = (key, line))
val grouping = errorLinesValue.groupByKeyAndWindow(Seconds(8), Seconds(4))
I get something like:
604: ---
605:
, Guillermo Ortiz konstt2...@gmail.com
wrote:
I'm a newbie with Spark,,, a simple question
val errorLines = lines.filter(_.contains(h))
val mapErrorLines = errorLines.map(line = (key, line))
val grouping = errorLinesValue.groupByKeyAndWindow(Seconds(8), Seconds(4))
I get something like:
604
and do something for each element.
}
I think that it must be pretty basic,, argg.
2014-12-17 18:43 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com:
What I would like to do it's to count the number of elements and if
it's greater than a number, I have to iterate all them and store them
in mysql
I'm trying to make some operation with windows and intervals.
I get data every15 seconds, and want to have a windows of 60 seconds
with batch intervals of 15 seconds.
I''m injecting data with ncat. if I inject 3 logs in the same interval
I get into the do something each 15 secods during one
the println(4...)?? shouldn't it execute all
the code each 15 seconds that it's what it's defined on the context
(val ssc = new StreamingContext(sparkConf, Seconds(15));)
2014-12-26 10:56 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com:
I'm trying to make some operation with windows and intervals.
I
Oh, I didn't understand what I was doing, my fault (too much parties
these xmas). Thought windows works in another weird way. Sorry for the
questions..
2014-12-26 13:42 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com:
I'm trying to understand why it's not working and I typed some println
what it's happeing.
2015-02-04 18:57 GMT+01:00 Sandy Ryza sandy.r...@cloudera.com:
Hi Guillermo,
What exactly do you mean by each iteration? Are you caching data in
memory?
-Sandy
On Wed, Feb 4, 2015 at 5:02 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
I execute a job in Spark where I'm
I execute a job in Spark where I'm processing a file of 80Gb in HDFS.
I have 5 slaves:
(32cores /256Gb / 7physical disks) x 5
I have been trying many different configurations with YARN.
yarn.nodemanager.resource.memory-mb 196Gb
yarn.nodemanager.resource.cpu-vcores 24
I have tried to execute the
. Though that wouldn't explain the high
GC. What percent of task time does the web UI report that tasks are
spending in GC?
On Fri, Feb 6, 2015 at 12:56 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
Yes, It's surpressing to me as well
I tried to execute it with different configurations
to me that you would be hitting a lot of GC for
this scenario. Are you setting --executor-cores and --executor-memory?
What are you setting them to?
-Sandy
On Thu, Feb 5, 2015 at 10:17 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
Any idea why if I use more containers I get a lot
Hi,
I want to process some files, there're a king of big, dozens of
gigabytes each one. I get them like a array of bytes and there's an
structure inside of them.
I have a header which describes the structure. It could be like:
Number(8bytes) Char(16bytes) Number(4 bytes) Char(1bytes), ..
Any idea why if I use more containers I get a lot of stopped because GC?
2015-02-05 8:59 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com:
I'm not caching the data. with each iteration I mean,, each 128mb
that a executor has to process.
The code is pretty simple.
final Conversor c = new
I'm trying to execute Spark from a Hadoop Cluster, I have created this
script to try it:
#!/bin/bash
export HADOOP_CONF_DIR=/etc/hadoop/conf
SPARK_CLASSPATH=
for lib in `ls /user/local/etc/lib/*.jar`
do
SPARK_CLASSPATH=$SPARK_CLASSPATH:$lib
done
:23 GMT+08:00 Guillermo Ortiz konstt2...@gmail.com:
I'm trying to execute Spark from a Hadoop Cluster, I have created this
script to try it:
#!/bin/bash
export HADOOP_CONF_DIR=/etc/hadoop/conf
SPARK_CLASSPATH=
for lib in `ls /user/local/etc/lib/*.jar`
do
SPARK_CLASSPATH
When I try to execute my task with Spark it starts to copy the jars it
needs to HDFS and it finally fails, I don't know exactly why. I have
checked HDFS and it copies the files, so, it seems to work that part.
I changed the log level to debug but there's nothing else to help.
What else does Spark
I was adding some bad jars I guess. I deleted all the jars and copied
them again and it works.
2015-01-08 14:15 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com:
When I try to execute my task with Spark it starts to copy the jars it
needs to HDFS and it finally fails, I don't know exactly why. I
I'm trying to execute a query with Spark.
(Example from the Spark Documentation)
val teenagers = people.where('age = 10).where('age = 19).select('name)
Is it possible to execute an OR with this syntax?
val teenagers = people.where('age = 10 'or 'age = 4).where('age =
19).select('name)
I have
thanks, it works.
2015-03-03 13:32 GMT+01:00 Cheng, Hao hao.ch...@intel.com:
Using where('age =10 'age =4) instead.
-Original Message-
From: Guillermo Ortiz [mailto:konstt2...@gmail.com]
Sent: Tuesday, March 3, 2015 5:14 PM
To: user
Subject: SparkSQL, executing an OR
I'm trying
Which is the equivalent function to Combiners of MapReduce in Spark?
I guess that it's combineByKey, but is combineByKey executed locally?
I understand than functions as reduceByKey or foldByKey aren't executed locally.
Reading the documentation looks like combineByKey is equivalent to
you tried it on your 300-machine cluster? I'm curious to know what
happened.
-Mosharaf
On Mon, Feb 23, 2015 at 8:06 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
I'm looking for about how scale broadcast variables in Spark and what
algorithm uses.
I have found
http://www.cs.berkeley.edu
I'm looking for about how scale broadcast variables in Spark and what
algorithm uses.
I have found
http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf
I don't know if they're talking about the current version (1.2.1)
because the file was created in 2010.
I
I have a question,
If I execute this code,
val users = sc.textFile(/tmp/users.log).map(x = x.split(,)).map(
v = (v(0), v(1)))
val contacts = sc.textFile(/tmp/contacts.log).map(y =
y.split(,)).map( v = (v(0), v(1)))
val usersMap = contacts.collectAsMap()
contacts.map(v = (v._1, (usersMap(v._1),
are right that this is mostly
because joins usually involve shuffles. If not, it's not as clear
which way is best. I suppose that if the Map is large-ish, it's safer
to not keep pulling it to the driver.
On Thu, Feb 26, 2015 at 10:00 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
I have
is
a local object. This bit has nothing to do with Spark.
Yes you would have to broadcast it to use it efficient in functions
(not on the driver).
On Thu, Feb 26, 2015 at 10:24 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
So, on my example, when I execute:
val usersMap
the copy in the driver.
On Thu, Feb 26, 2015 at 10:47 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
Isn't it contacts.map(v = (v._1, (usersMap(v._1), v._2))).collect()
executed in the executors? why is it executed in the driver?
contacts are not a local object, right
I want to read from many topics in Kafka and know from where each message
is coming (topic1, topic2 and so on).
val kafkaParams = Map[String, String](metadata.broker.list -
myKafka:9092)
val topics = Set(EntryLog, presOpManager)
val directKafkaStream = KafkaUtils.createDirectStream[String,
I'm tryting to execute the Hello World example with Spark + Kafka (
https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala)
with createDirectStream and I get this error.
java.lang.NoSuchMethodError:
Sorry, I had a duplicated kafka dependency with another older version in
another pom.xml
2015-05-05 14:46 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
I'm tryting to execute the Hello World example with Spark + Kafka (
https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main
Hi,
I have two streaming RDD1 and RDD2 and want to cogroup them.
Data don't come in the same time and sometimes they could come with some
delay.
When I get all data I want to insert in MongoDB.
For example, imagine that I get:
RDD1 -- T 0
RDD2 --T 0.5
I do cogroup between them but I couldn't
(splitRegister.length) = 1
splitRegister.copyToArray(newArray)
}
(splitRegister(1), newArray)
}
If I check the length of splitRegister is always 2 in each slide, it is
never three.
2015-05-18 15:36 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
Hi,
I have two streaming RDD1
Hi,
I'm executing a SparkStreamig code with Kafka. IçThe code was working but
today I tried to execute the code again and I got an exception, I dn't know
what's it happening. right now , there are no jobs executions on YARN.
How could it fix it?
Exception in thread main
stateful operations.
2. Could you try not using the SPARK_CLASSPATH environment variable.
TD
On Sat, Jun 27, 2015 at 1:00 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
I don't have any checkpoint on my code. Really, I don't have to save any
state. It's just a log processing of a PoC.
I have
: Requested user hdfs is not whitelisted and has id 496,which
is below the minimum allowed 1000
Container exited with a non-zero exit code 255
Failing this attempt. Failing the application.
2015-06-27 11:25 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
Well SPARK_CLASSPATH it's just a random name
, or
otherwise?
Also cc'ed Hari who may have a better idea of YARN related issues.
On Sat, Jun 27, 2015 at 12:35 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
Hi,
I'm executing a SparkStreamig code with Kafka. IçThe code was working but
today I tried to execute the code again and I got
that. Mind renamign that variable and trying it out
again? At least it will reduce one possible source of problem.
TD
On Sat, Jun 27, 2015 at 2:32 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
I'm checking the logs in YARN and I found this error as well
Application
Hi,
I'm trying to connect to two topics of Kafka with Spark with DirectStream
but I get an error. I don't know if there're any limitation to do it,
because when I just access to one topics everything if right.
*val ssc = new StreamingContext(sparkConf, Seconds(5))*
*val kafkaParams =
I'm using SparkStreaming and I want to configure checkpoint to manage
fault-tolerance.
I've been reading the documentation. Is it necessary to create and
configure the InputDSStream in the getOrCreate function?
I checked the example in
I have some problem with the JobScheduler. I have executed same code in two
cluster. I read from three topics in Kafka with DirectStream so I have
three tasks.
I have check YARN and there aren't more jobs launched.
The cluster where I have troubles I got this logs:
15/07/30 14:32:58 INFO
I read about maxRatePerPartition parameter, I haven't set this parameter.
Could it be the problem?? Although this wouldn't explain why it doesn't
work in one of the clusters.
2015-07-30 14:47 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
They just share the kafka, the rest of resources
They just share the kafka, the rest of resources are independents. I tried
to stop one cluster and execute just the cluster isn't working but it
happens the same.
2015-07-30 14:41 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
I have some problem with the JobScheduler. I have executed same
at
MetricsSpark.scala:67, took 60.391761 s
15/07/30 14:37:35 INFO DAGScheduler: Job 93 finished: foreachRDD at
MetricsSpark.scala:67, took 0.531323 s
Are those jobs running on the same topicpartition?
On Thu, Jul 30, 2015 at 8:03 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
I read about
, 2015 at 10:46 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
The difference is that one recives more data than the others two. I can
pass thought parameters the topics, so, I could execute the code trying
with one topic and figure out with one is the topic, although I guess that
it's
the results.
On Thu, Jul 30, 2015 at 9:29 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
I have three topics with one partition each topic. So each jobs run about
one topics.
2015-07-30 16:20 GMT+02:00 Cody Koeninger c...@koeninger.org:
Just so I'm clear, the difference in timing you're talking
:15 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
It doesn't make sense to me. Because in the another cluster process all
data in less than a second.
Anyway, I'm going to set that parameter.
2015-07-31 0:36 GMT+02:00 Tathagata Das t...@databricks.com:
Yes, and that is indeed the problem
I'm executing a job with Spark Streaming and got this error all times when
the job has been executing for a while (usually hours of days).
I have no idea why it's happening.
15/07/30 13:02:14 ERROR LiveListenerBus: Listener EventLoggingListener
threw an exception
I don't get to activate the logs for my classes. I'm using CDH 5.4 with
Spark 1.3.0
I have a class in Scala with some log.debug, I create a class to log:
package example.spark
import org.apache.log4j.Logger
object Holder extends Serializable {
@transient lazy val log =
I'm trying to index document to Solr from Spark with the library solr-spark
I have create a project with Maven and include all the dependencies when I
execute spark but I get a ClassNotFoundException. I have check that the
class is in one of the jar that I'm including ( solr-solrj-4.10.3.jar)
I
)
at
org.apache.solr.common.cloud.ZkStateReader.createClusterStateWatchersAndUpdate(ZkStateReader.java:334)
at
org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:243)
2015-12-16 16:26 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>:
> I'm trying to index document to Solr from Spark with the library solr-spark
>
>
streaming-programming-guide.html#checkpointing
>
> On Mon, Nov 30, 2015 at 9:38 AM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I have Spark and Kafka with directStream. I'm trying that if Spark dies
>> it could process all those messages
Hello,
I have Spark and Kafka with directStream. I'm trying that if Spark dies it
could process all those messages when it starts. The offsets are stored in
chekpoints but I don't know how I could say to Spark to start in that point.
I saw that there's another createDirectStream method with a
I use Spark Streaming with Kafka and I'd like to know how many consumers
are generated. I guess that as many as partitions in Kafka but I'm not
sure.
Is there a way to know the name of the groupId generated in Spark to Kafka?
;> You could easily model your data as an RDD or tuples (or as a
>> dataframe/set) and use the sortBy (or orderBy for dataframe/sets)
>> methods.
>>
>> best,
>> --Jakob
>>
>> On Wed, Feb 24, 2016 at 2:26 PM, Guillermo Ortiz <konstt2...@gmail.c
When you do a join in Spark, how many partitions are as result? is it a
default number if you don't specify the number of partitions?
thm implementation.
>
> ---
> Robin East
> *Spark GraphX in Action* Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
&
;
> On Thu, Feb 25, 2016 at 7:42 PM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
>
>> When you do a join in Spark, how many partitions are as result? is it a
>> default number if you don't specify the number of partitions?
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>
Oh, the letters were just an example, it could be:
a , t
b, o
t, k
k, c
So.. a -> t -> k -> c and the result is: a,c; t,c; k,c and b,o
I don't know if you were thinking about sortBy because the another example
where letter were consecutive.
2016-02-25 9:42 GMT+01:00 Guillermo Ortiz
; (a-b) -> (a-b-c, a-b-e)
> (b-c) -> (a-b-c, b-c-d)
> (c-d) -> (b-c-d)
> (b-e) -> (b-e-f)
> (e-f) -> (b-e-f, e-f-c)
> (f-c) -> (e-f-c)
>
> filter out keys with less than 2 values
>
> (b-c) -> (a-b-c, b-c-d)
> (e-f) -> (b-e-f, e-f-c)
>
> ma
> partition.
>
>
>
> Cheers,
>
> Ximo
>
>
>
> *De:* Guillermo Ortiz [mailto:konstt2...@gmail.com]
> *Enviado el:* jueves, 25 de febrero de 2016 15:19
> *Para:* Takeshi Yamamuro <linguin@gmail.com>
> *CC:* user <user@spark.apache.org>
> *Asunto
m my Verizon Wireless 4G LTE smartphone
>
>
> ---- Original message
> From: Guillermo Ortiz <konstt2...@gmail.com>
> Date: 02/24/2016 5:26 PM (GMT-05:00)
> To: user <user@spark.apache.org>
> Subject: How could I do this algorithm in Spark?
>
> I want to do some
I'm new with graphX. I need to get the vertex without out edges..
I guess that it's pretty easy but I did it pretty complicated.. and
inefficienct
val vertices: RDD[(VertexId, (List[String], List[String]))] =
sc.parallelize(Array((1L, (List("a"), List[String]())),
(2L, (List("b"),
es == 0
>
> )
>
>
>
> val verticesWithNoOutEdges = graphWithNoOutEdges.vertices
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Guillermo Ortiz [mailto:kon
East
> *Spark GraphX in Action* Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
> On 26 Feb 2016, at 11:59, Guillermo Ortiz <konstt2...@gmail.com> wrote:
>
> I'm new with graphX. I need to get
I want to do some algorithm in Spark.. I know how to do it in a single
machine where all data are together, but I don't know a good way to do it
in Spark.
If someone has an idea..
I have some data like this
a , b
x , y
b , c
y , y
c , d
I want something like:
a , d
b , d
c , d
x , y
y , y
I
I'm using Spark Streaming and Kafka with Direct Approach. I have created a
topic with 6 partitions so when I execute Spark there are six RDD. I
understand than ideally it should have six executors to process each one
one RDD. To do it, when I execute spark-submit (I use YARN) I specific the
on executor throws ClassNotFoundException on
> driver
>
> FYI
>
> On Thu, Jan 21, 2016 at 7:10 AM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
>
>> I'm using CDH 5.5.1 with Spark 1.5.x (I think that it's 1.5.2).
>>
>> I know that the library is here:
I'm runing a Spark Streaming process and it stops in a while. It makes some
process an insert the result in ElasticSeach with its library. After a
while the process fail.
I have been checking the logs and I have seen this error
2016-01-21 14:57:54,388
>
> Which Spark version are you using ?
>
> Cheers
>
> On Thu, Jan 21, 2016 at 6:50 AM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
>
>> I'm runing a Spark Streaming process and it stops in a while. It makes
>> some process an insert the result in Elastic
I think that it's that bug, because the error is the same.. thanks a lot.
2016-01-21 16:46 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>:
> I'm using 1.5.0 of Spark confirmed. Less this
> jar file:/opt/centralLogs/lib/spark-catalyst_2.10-1.5.1.jar.
>
> I'm going to keep l
I have a DirectStream and process data from Kafka,
val directKafkaStream = KafkaUtils.createDirectStream[String, String,
StringDecoder, StringDecoder](ssc, kafkaParams1, topics1.toSet)
directKafkaStream.foreachRDD { rdd =>
val offsets = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
When
I'm curious about what kind of things are saved in the checkpoints.
I just changed the number of executors when I execute Spark and it didn't
happen until I remove the checkpoint, I guess that if I'm using
log4j.properties and I want to changed I have to remove the checkpoint as
well.
When you
I'm trying to configure log4j in Spark.
spark-submit --conf spark.metrics.conf=metrics.properties --name
"myProject" --master yarn-cluster --class myCompany.spark.MyClass *--files
/opt/myProject/conf/log4j.properties* --jars $SPARK_CLASSPATH
--executor-memory 1024m --num-executors 5
cutors 5
--executor-cores 1 --driver-memory 1024m *--files
/opt/myProject/conf/log4j.properties* /opt/myProject/myJar.jar
I think I didn't do any others changes.
2016-03-30 15:42 GMT+02:00 Guillermo Ortiz <konstt2...@gmail.com>:
> I'm trying to configure log4j in Sp
I'm trying to read data from Spark and index to ES with its library
(es-hadoop 2.2.1 version).
IIt was working right for a while but now it has started to happen this error.
I have delete the checkpoint and even the kafka topic and restart all
the machines with kafka and zookeeper but it didn't
I think that it's a kafka error, but I'm starting thinking if it could
be something about elasticsearch since I have seen more people with
same error using elasticsearch. I have no idea.
2016-05-06 11:05 GMT+02:00 Guillermo Ortiz <konstt2...@gmail.com>:
> I'm trying to read data f
[JobGenerator] INFO
org.apache.spark.streaming.scheduler.JobScheduler - Added jobs for
time 146252629 ms
2016-05-06 11:18:10,015 [JobGenerator] INFO
org.apache.spark.streaming.scheduler.JobGenerator - Checkpointing
graph for time 146252629 ms
2016-05-06 11:11 GMT+02:00 Guillermo Ortiz <kons
I'm trying to execute a job with Spark and Kafka and I'm getting this error.
I know that it's becuase the version are not right, but I have been
checking the jar which I import on the SparkUI spark.yarn.secondary.jars
and they are right and the class exists inside *kafka_2.10-0.8.2.1.jar. *
earch.
>
> On Fri, May 6, 2016 at 4:22 AM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
> > This is the complete error.
> >
> > 2016-05-06 11:18:05,424 [task-result-getter-0] INFO
> > org.apache.spark.scheduler.TaskSetManager - Finished task 5.0 in stage
> &g
:36 CET 2015 kafka/javaapi/TopicMetadataRequest.class
2135 Thu Feb 26 14:30:38 CET 2015
kafka/server/KafkaApis$$anonfun$handleTopicMetadataRequest$1.class
2016-05-09 12:51 GMT+02:00 Guillermo Ortiz <konstt2...@gmail.com>:
> I'm trying to execute a job with Spark and Kafka and I'
ersion of kafka is embedded in any of the
> jars listed below.
>
> Cheers
>
>
> On Mon, May 9, 2016 at 4:00 AM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
>
>> *jar tvf kafka_2.10-0.8.2.1.jar | grep TopicMetadataRequest *
>> 1757 Thu Feb 26 14:30:34 CET
picMetadataRequest
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> h
I'm wondering to use Flume (channel file)-Spark Streaming.
I have some doubts about it:
1.The RDD size is all data what it comes in a microbatch which you have
defined. Risght?
2.If there are 2Gb of data, how many are RDDs generated? just one and I
have to make a repartition?
3.When is the ACK
Avro sink --> Spark Streaming
2017-01-16 13:55 GMT+01:00 ayan guha <guha.a...@gmail.com>:
> With Flume, what would be your sink?
>
>
>
> On Mon, Jan 16, 2017 at 10:44 PM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
>
>> I'm wondering to use Flume (c
Hello,
I'm using spark 2.0 and Cassandra. Is there any util to make unit test
easily or which one would be the best way to do it? library? Cassandra with
docker?
nation.
>
>
>
> 2018-01-17 16:48 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>:
>
>> Hello,
>>
>> I'm using spark 2.0 and Cassandra. Is there any util to make unit test
>> easily or which one would be the best way to do it? library? Cassandra with
>>
uld the blockage be in their compute
> creation instead of their caching?
>
> Thanks,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
> On Thu, Aug 23, 2018 at 6:38 PM, Guillermo Ortiz
> wrote:
>
&
Another test I just did it's to execute with local[X] and this problem
doesn't happen. Communication problems?
2018-08-23 22:43 GMT+02:00 Guillermo Ortiz :
> it's a complex DAG before the point I cache the RDD, they are some joins,
> filter and maps before caching data, but most of the
I use spark with caching with persist method. I have several RDDs what I
cache but some of them are pretty small (about 300kbytes). Most of time it
works well and usually lasts 1s the whole job, but sometimes it takes about
40s to store 300kbytes to cache.
If I go to the SparkUI->Cache, I can see
I have many spark processes, some of them are pretty simple and they don't
have to process almost messages but they were developed with the same
archeotype and they use spark.
Some of them are executed with many executors but a few ones don't make
sense to process with more than 2-4 cores in only
I'm trying to integrate with schemaRegistry and SparkStreaming. By the
moment I want to use GenericRecords. It seems that my producer works and
new schemas are published in _schemas topic. When I try to read with my
Consumer, I'm not able to deserialize the data.
How could I say to Spark that
1 - 100 of 119 matches
Mail list logo