Hi Jeff,
Can you provide more information about how are you running your job? In
particular:
- which cluster manager are you using? It is YARN, Mesos, Spark
Standalone?
- with configuration options are you using to submit the job? In
particular are you using dynamic allocation or external
Hi all,
I would also would like to participate on that.
Greetings,
Juan
On Fri, Dec 9, 2016 at 6:03 AM, Michael Stratton <
michael.strat...@komodohealth.com> wrote:
> That sounds great, please include me so I can get involved.
>
> On Fri, Dec 9, 2016 at 7:39 AM, Marco Mistroni
sxw...@gmail.com>:
> Looks like you returns a "Some(null)" in "compute". If you don't want to
> create a RDD, it should return None. If you want to return an empty RDD, it
> should return "Some(sc.emptyRDD)".
>
> Best Regards,
> Shixiong Zh
Hi,
I've developed a ScalaCheck property for testing Spark Streaming
transformations. To do that I had to develop a custom InputDStream, which
is very similar to QueueInputDStream but has a method for adding new test
cases for dstreams, which are objects of type Seq[Seq[A]], to the DStream.
You
Hi,
I've had some troubles developing a Specs2 matcher that checks that a
predicate holds for all the elements of an RDD, and using it for testing a
simple Spark Streaming program. I've finally been able to get a code that
works, you can see it in
Hi,
Just my two cents. I understand your problem is that your problem is that
you have messages with the same key in two different dstreams. What I would
do would be making a union of all the dstreams with StreamingContext.union
or several calls to DStream.union, and then I would create a pair
Hi,
I'm using a simple akka actor to create a actorStream. The actor just
forwards the messages received to the stream by
calling super[ActorHelper].store(msg). This works ok when I create the
stream with
ssc.actorStream[A](Props(new ProxyReceiverActor[A]), receiverActorName)
but when I try to
Hi,
I have noticed that when StreamingContext.stop is called when no receiver
has started yet, then the context is not really stopped. Watching the logs
it looks like a stop signal is sent to 0 receivers, because the receivers
have not started yet, and then the receivers are started and the
in Spark 1.3.1 and Spark 1.4.0. Here
https://gist.github.com/juanrh/eaf34cf0a308a87db32c you have the corrected
example in case someone is interested.
Greetings,
Juan
2015-07-10 18:27 GMT+02:00 Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com:
Hi,
I'm trying to create a Spark Streaming
Hi,
I'm trying to create a Spark Streaming actor stream but I'm having several
problems. First of all the guide from
https://spark.apache.org/docs/latest/streaming-custom-receivers.html refers
to the code
. Can you make it a JIRA.
TD
On Tue, Jun 23, 2015 at 7:59 AM, Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Hi,
I'm running a program in Spark 1.4 where several Spark Streaming contexts
are created from the same Spark context. As pointed in
https://spark.apache.org/docs
Hi,
I'm running a program in Spark 1.4 where several Spark Streaming contexts
are created from the same Spark context. As pointed in
https://spark.apache.org/docs/latest/streaming-programming-guide.html each
Spark Streaming context is stopped before creating the next Spark Streaming
context. The
Perfect! I'll start working on it
2015-06-13 2:23 GMT+02:00 Amit Ramesh a...@yelp.com:
Hi Juan,
I have created a ticket for this:
https://issues.apache.org/jira/browse/SPARK-8337
Thanks!
Amit
On Fri, Jun 12, 2015 at 3:17 PM, Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com
Hi,
If you want I would be happy to work in this. I have worked with
KafkaUtils.createDirectStream before, in a pull request that wasn't
accepted https://github.com/apache/spark/pull/5367. I'm fluent with Python
and I'm starting to feel comfortable with Scala, so if someone opens a JIRA
I can
Hi,
I agree with Evo, Spark works at a different abstraction level than Storm,
and there is not a direct translation from Storm topologies to Spark
Streaming jobs. I think something remotely close is the notion of lineage
of DStreams or RDDs, which is similar to a logical plan of an engine like
Hi,
You can use the method repartition from DStream (for the Scala API) or
JavaDStream (for the Java API)
defrepartition(numPartitions: Int): DStream
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/streaming/dstream/DStream.html
[T]
Return a new DStream with an increased or
Hi,
Maybe you could use streamingContext.fileStream like in the example from
https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers,
you can read from files on any file system compatible with the HDFS API
(that is, HDFS, S3, NFS, etc.). You could split
everything inside Spark if possible.
Maybe I'm missing something here, any idea would be appreciated.
Thanks in advance for your help,
Greetings,
Juan Rodriguez
2015-02-18 20:23 GMT+01:00 Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com:
Hi Sean,
Thanks a lot for your answer
and https://issues.apache.org/jira/browse/SPARK-2444, and now it works ok.
Greetings,
Juan
2015-02-23 10:40 GMT+01:00 Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com:
Hi,
I'm having problems using pipe() from a Spark program written in Java,
where I call a python script, running in a YARN
Hi,
I'm having problems using pipe() from a Spark program written in Java,
where I call a python script, running in a YARN cluster. The problem is
that the job fails when YARN kills the container because the python script
is going beyond the memory limits. I get something like this in the log:
Hi,
I'm writing a Spark program where I want to divide a RDD into different
groups, but the groups are too big to use groupByKey. To cope with that,
since I know in advance the list of keys for each group, I build a map from
the keys to the RDDs that result from filtering the input RDD to get the
just in pure
Java / Scala to do whatever you need inside your function.
On Wed, Feb 18, 2015 at 2:12 PM, Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Hi Paweł,
Thanks a lot for your answer. I finally got the program to work by using
aggregateByKey, but I was wondering why
into memory issues.
Please let me know if that helped
Pawel Szulc
@rabbitonweb
http://www.rabbitonweb.com
On Wed, Feb 18, 2015 at 12:06 PM, Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Hi,
I'm writing a Spark program where I want to divide a RDD into different
groups
is not that a problem (specially if
you can cache it), and in fact it's way better to keep advantage of
resilience etc etc that comes with Spark.
my2c
andy
On Wed Dec 17 2014 at 7:07:05 PM Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Hi Andy, thanks for your response. I
Hi all,
I would like to be able to split a RDD in two pieces according to a
predicate. That would be equivalent to applying filter twice, with the
predicate and its complement, which is also similar to Haskell's partition
list function (
you'll have to do is to partition each
partition each partition (:-D) or create two RDDs with by filtering twice →
hence tasks will be scheduled distinctly, and data read twice. Choose
what's best for you!
hth,
andy
On Wed Dec 17 2014 at 5:57:56 PM Juan Rodríguez Hortalá
Hi list,
In an excelent blog post on Kafka and Spark Streaming integrartion (
http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/),
Michael Noll poses an assumption about the number of partitions of the RDDs
created by input DStreams. He says his
On Wed, Jul 30, 2014 at 2:13 PM, Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Akhil, Andry, thanks a lot for your suggestions. I will take a look to
those JVM options.
Greetings,
Juan
2014-07-28 18:56 GMT+02:00 andy petrella andy.petre...@gmail.com:
Also check the guides
it by heart :/
Le 28 juil. 2014 18:44, Akhil Das ak...@sigmoidanalytics.com a écrit :
A quick fix would be to implement java.io.Serializable in those classes
which are causing this exception.
Thanks
Best Regards
On Mon, Jul 28, 2014 at 9:21 PM, Juan Rodríguez Hortalá
for CDH4.6 maybe?
they are certainly built vs Hadoop 2)
On Wed, Jul 16, 2014 at 10:32 AM, Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Hi,
I'm running a Java program using Spark Streaming 1.0.0 on Cloudera 4.4.0
quickstart virtual machine, with hadoop-client 2.0.0-mr1-cdh4.4.0
Hi,
I'm running a Java program using Spark Streaming 1.0.0 on Cloudera 4.4.0
quickstart virtual machine, with hadoop-client 2.0.0-mr1-cdh4.4.0, which is
the one corresponding to my Hadoop distribution, and that works with other
mapreduce programs, and with the maven property
program to prevent resource leakage. A good choice is
to use third-party DB connection library which supports connection pool,
that will alleviate your programming efforts.
Thanks
Jerry
*From:* Juan Rodríguez Hortalá [mailto:juan.rodriguez.hort...@gmail.com]
*Sent:* Tuesday, July 08, 2014 6
) or multiple times (to deal with failures/timeouts)
etc., which is maybe something you want to deal with in your SQL.
Tobias
On Tue, Jul 8, 2014 at 3:40 AM, Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Hi list,
I'm writing a Spark Streaming program that reads from a kafka
task can get this connection each time
executing a task, not creating a new one, that would be good for your
scenario, since create a connection is quite expensive for each task.
Thanks
Jerry
*From:* Juan Rodríguez Hortalá [mailto:juan.rodriguez.hort...@gmail.com]
*Sent:* Tuesday, July 08
Hi list,
I'm writing a Spark Streaming program that reads from a kafka topic,
performs some transformations on the data, and then inserts each record in
a database with foreachRDD. I was wondering which is the best way to handle
the connection to the database so each worker, or even each task,
Hi all,
I'm writing a Spark Streaming program that uses reduceByKeyAndWindow(), and
when I change the windowsLenght or slidingInterval I get the following
exceptions, running in local mode
14/07/06 13:03:46 ERROR actor.OneForOneStrategy: key not found:
1404677026000 ms
Hi,
To cope with the issue with META-INF that Sean is pointing out, my solution
is replacing maven-assembly.plugin with maven-shade-plugin, using the
ServicesResourceTransformer (
http://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer)
37 matches
Mail list logo