).map(_._2)
streamtoread.sample(withReplacement = true, fraction = fraction)
How do I use the sample
http://spark.apache.org/docs/latest/programming-guide.html#transformations()
method with Spark Streaming?
Thanks,
Josh
/ stage / task progress information, as well as expanding the
types of information exposed through the stable status API interface.
- Josh
On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman eric.d.fried...@gmail.com
wrote:
Spark 1.2.0 is SO much more usable than previous releases -- many thanks
a bit of additional context in the meantime.
- Josh
On Thu, Dec 25, 2014 at 5:36 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Nick,
uh, I would have expected a rather heated discussion, but the opposite
seems to be the case ;-)
Independent of my personal preferences w.r.t. usability, habits etc
will be sent to both spark.incubator.apache.org and spark.apache.org (if
that is the case, i'm not sure which alias nabble posts get sent to) would
make things a lot more clear.
On Sat, Dec 13, 2014 at 5:05 PM, Josh Rosen rosenvi...@gmail.com wrote:
I've noticed that several users are attempting to post
by the mailing list.
I wanted to mention this issue to the Spark community to see whether there
are any good solutions to address this. I have spoken to users who think
that our mailing list is unresponsive / inactive because their un-posted
messages haven't received any replies.
- Josh
SerializableMapWrapper was added in
https://issues.apache.org/jira/browse/SPARK-3926; do you mind opening a new
JIRA and linking it to that one?
On Mon, Dec 1, 2014 at 12:17 AM, lokeshkumar lok...@dataken.net wrote:
The workaround was to wrap the map returned by spark libraries into HashMap
can maintain exactly once
semantics when writing to topic 2?
Thanks,
Josh
Is there a way to do this that preserves exactly once semantics for the
write to Kafka?
On Tue, Sep 2, 2014 at 12:30 PM, Tim Smith secs...@gmail.com wrote:
I'd be interested in finding the answer too. Right now, I do:
val kafkaOutMsgs = kafkInMessages.map(x=myFunc(x._2,someParam))
also do a
lot more with it than just the Phoenix functions provide.
I don't know if this works with PySpark or not, but assuming the
'newHadoopRDD' functionality works for other input formats, it should work
for Phoenix as well.
Josh
On Fri, Nov 21, 2014 at 5:12 PM, Alaa Ali contact.a...@gmail.com
:
https://github.com/simplymeasured/phoenix-spark
Josh
On Fri, Nov 21, 2014 at 4:14 PM, Alaa Ali contact.a...@gmail.com wrote:
I want to run queries on Apache Phoenix which has a JDBC driver. The query
that I want to run is:
select ts,ename from random_data_date limit 10
But I'm having issues
Hi,
I was wondering if the adaptive stream processing and dynamic batch
processing was available to use in spark streaming? If someone could help
point me in the right direction?
Thanks,
Josh
Referring to this paper http://dl.acm.org/citation.cfm?id=2670995.
On Fri, Nov 14, 2014 at 10:42 AM, Josh J joshjd...@gmail.com wrote:
Hi,
I was wondering if the adaptive stream processing and dynamic batch
processing was available to use in spark streaming? If someone could help
point me
Hi,
Is it possible to concatenate or append two Dstreams together? I have an
incoming stream that I wish to combine with data that's generated by a
utility. I then need to process the combined Dstream.
Thanks,
Josh
I think it's just called union
On Tue, Nov 11, 2014 at 2:41 PM, Josh J joshjd...@gmail.com wrote:
Hi,
Is it possible to concatenate or append two Dstreams together? I have an
incoming stream that I wish to combine with data that's generated by a
utility. I then need to process the combined
)
and
found : java.util.LinkedList[org.apache.spark.rdd.RDD[String]]
required: scala.collection.mutable.Queue[org.apache.spark.rdd.RDD[?]]
Thanks,
Josh
: Ordering[K], implicit ctag:
scala.reflect.ClassTag[K])org.apache.spark.rdd.RDD[String].
Unspecified value parameter f.
On Tue, Nov 4, 2014 at 11:28 AM, Josh J joshjd...@gmail.com wrote:
Hi,
Does anyone have any good examples of using sortby for RDDs and scala?
I'm receiving
not enough
Hi,
Is there a nice or optimal method to randomly shuffle spark streaming RDDs?
Thanks,
Josh
? in general RDDs don't have
ordering at all -- excepting when you sort for example -- so a
permutation doesn't make sense. Do you just want a well-defined but
random ordering of the data? Do you just want to (re-)assign elements
randomly to partitions?
On Mon, Nov 3, 2014 at 4:33 PM, Josh J joshjd
is guaranteed about that.
If you want to permute an RDD, how about a sortBy() on a good hash
function of each value plus some salt? (Haven't thought this through
much but sounds about right.)
On Mon, Nov 3, 2014 at 4:59 PM, Josh J joshjd...@gmail.com wrote:
When I'm outputting the RDDs
Hi,
I've written a short scala app to perform word counts on a text file and am
getting the following exception as the program completes (after it prints
out all of the word counts).
Exception in thread delete Spark temp dir
C:\Users\Josh\AppData\Local\Temp\spark-0fdd0b79-7329-4690-a093
Hi,
How do I run multiple spark applications in parallel? I tried to run on
yarn cluster, though the second application submitted does not run.
Thanks,
Josh
, Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
32 GB RAM
Thanks,
Josh
On Tue, Oct 28, 2014 at 4:15 PM, Soumya Simanta soumya.sima...@gmail.com
wrote:
Try reducing the resources (cores and memory) of each application.
On Oct 28, 2014, at 7:05 PM, Josh J joshjd...@gmail.com wrote:
Hi,
How
than once in the event of a worker
failure.
http://spark.apache.org/docs/latest/streaming-programming-guide.html#failure-of-a-worker-node
Thanks,
Josh
Hi,
How could I combine rdds? I would like to combine two RDDs if the count in
an RDD is not above some threshold.
Thanks,
Josh
Hi,
Is there a dockerfiles available which allow to setup a docker spark 1.1.0
cluster?
Thanks,
Josh
Hi,
How can I join neighbor sliding windows in spark streaming?
Thanks,
Josh
drivers
and workers.
- Josh
On Fri, Oct 10, 2014 at 5:24 PM, Andy Davidson
a...@santacruzintegration.com wrote:
Hi
I am running spark on an ec2 cluster. I need to update python to 2.7. I
have been following the directions on
http://nbviewer.ipython.org/gist/JoshRosen/6856670
https
Hi Theo,
Check out *spark-perf*, a suite of performance benchmarks for Spark:
https://github.com/databricks/spark-perf.
- Josh
On Fri, Oct 10, 2014 at 7:27 PM, Theodore Si sjyz...@gmail.com wrote:
Hi,
Let's say that I managed to port Spark from TCP/IP to RDMA.
What tool or benchmark can I
/2144
- Josh
On Fri, Oct 3, 2014 at 6:44 PM, tomo cocoa cocoatom...@gmail.com wrote:
Hi,
I prefer that PySpark can also be executed on Python 3.
Do you have some reason or demand to use PySpark through Python3?
If you create an issue on JIRA, I would try to resolve it.
On 4 October
If I recall, you should be able to start Hadoop MapReduce using
~/ephemeral-hdfs/sbin/start-mapred.sh.
On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com wrote:
Hi,
I would like to copy log files from s3 to the cluster's
ephemeral-hdfs. I tried to use distcp, but I guess
of countByWindow with a
function that performs the save operation.
On Fri, Aug 22, 2014 at 1:58 AM, Josh J joshjd...@gmail.com wrote:
Hi,
Hopefully a simple question. Though is there an example of where to save
the output of countByWindow ? I would like to save the results to external
storage (kafka
Hi,
Hopefully a simple question. Though is there an example of where to save
the output of countByWindow ? I would like to save the results to external
storage (kafka or redis). The examples show only stream.print()
Thanks,
Josh
windowMessages1 =
messages.window(windowLength,slideInterval);
JavaPairDStreamString,String windowMessages2 =
messages.window(windowLength,slideInterval);
Thanks,
Josh
DStream. How can I accomplish this with spark?
Sincerely,
Josh
Hi,
Whats the difference between amplab docker
https://github.com/amplab/docker-scripts and spark docker
https://github.com/apache/spark/tree/master/docker?
Thanks,
Josh
Has anyone tried using functools.partial (
https://docs.python.org/2/library/functools.html#functools.partial) with
PySpark? If it works, it might be a nice way to address this use-case.
On Sun, Aug 17, 2014 at 7:35 PM, Davies Liu dav...@databricks.com wrote:
On Sun, Aug 17, 2014 at 11:21 AM,
, upper bound index, and number of partitions. With that example
query and those values, you should end up with an RDD with two partitions,
one with the student_info from 1 through 10, and the second with ids 11
through 20.
Josh
On Wed, Jul 30, 2014 at 6:58 PM, chaitu reddy chaitzre...@gmail.com
You have to use `myBroadcastVariable.value` to access the broadcasted
value; see
https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
On Fri, Jul 18, 2014 at 2:56 PM, Vedant Dhandhania
ved...@retentionscience.com wrote:
Hi All,
I am trying to broadcast a set in a
submit jobs to the cluster either.
Thanks!
Josh
Jeremy,
Just to be clear, are you assembling a jar with that class compiled (with
its dependencies) and including the path to that jar on the command line in
an environment variable (e.g. SPARK_CLASSPATH=path ./spark-shell)?
--j
On Saturday, May 24, 2014, Jeremy Lewi jer...@lewi.us wrote:
Hi
...@gmail.com');
wrote:
Which version is this with? I haven’t seen standalone masters lose
workers. Is there other stuff on the machines that’s killing them, or what
errors do you see?
Matei
On May 16, 2014, at 9:53 AM, Josh Marcus
jmar...@meetup.comjavascript:_e(%7B%7D,'cvml','jmar
, 2014 at 3:28 PM, Josh Marcus jmar...@meetup.com wrote:
We're using spark 0.9.0, and we're using it out of the box -- not using
Cloudera Manager or anything similar.
There are warnings from the master that there continue to be heartbeats
from the unregistered workers. I will see
and messages semi-regularly on CDH5 + 0.9.0. I don't have any insight
into when it happens, but usually after heavy use and after running
for a long time. I had figured I'd see if the changes since 0.9.0
addressed it and revisit later.
On Tue, May 20, 2014 at 8:37 PM, Josh Marcus jmar
Hey folks,
I'm wondering what strategies other folks are using for maintaining and
monitoring the stability of stand-alone spark clusters.
Our master very regularly loses workers, and they (as expected) never
rejoin the cluster. This is the same behavior I've seen
using akka cluster (if that's
or not though, so if anyone else is looking into this,
I'd love to hear their thoughts.
Josh
On Tue, Apr 8, 2014 at 1:00 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Just took a quick look at the overview
herehttp://phoenix.incubator.apache.org/ and
the quick start guide
herehttp
Nope, nested RDDs aren't supported:
https://groups.google.com/d/msg/spark-users/_Efj40upvx4/DbHCixW7W7kJ
https://groups.google.com/d/msg/spark-users/KC1UJEmUeg8/N_qkTJ3nnxMJ
https://groups.google.com/d/msg/spark-users/rkVPXAiCiBk/CORV5jyeZpAJ
On Sun, Mar 2, 2014 at 5:37 PM, Cosmin Radoi
101 - 146 of 146 matches
Mail list logo