arkContext.(JavaSparkContext.scala:58)
--
jay vyas
not too worries about this - but it seems like it might be nice if
maybe we could specify a user name as part of sparks context or as part of
an external parameter rather then having to
use the java based user/group extractor.
--
jay vyas
a producer and a consumer, so that you
don't get a starvation scenario.
On Wed, Aug 12, 2015 at 7:31 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
Is there a way to run spark streaming methods in standalone eclipse
environment to test out the functionality?
--
jay vyas
In general the simplest way is that you can use the Dynamo Java API as is and
call it inside a map(), and use the asynchronous put() Dynamo api call .
On Aug 7, 2015, at 9:08 AM, Yasemin Kaya godo...@gmail.com wrote:
Hi,
Is there a way using DynamoDB in spark application? I have to
, 2015 at 11:11 PM, Dogtail Ray spark.ru...@gmail.com wrote:
Hi,
I have modified some Hadoop code, and want to build Spark with the
modified version of Hadoop. Do I need to change the compilation dependency
files? How to then? Great thanks!
--
jay vyas
For additional commands, e-mail: user-h...@spark.apache.org
--
jay vyas
.
Please let me know If I am doing something wrong.
--
jay vyas
.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
jay vyas
Just the same as spark was disrupting the hadoop ecosystem by changing the
assumption that you can't rely on memory in distributed analytics...now
maybe we are challenging the assumption that big data analytics need to
distributed?
I've been asking the same question lately and seen similarly that
(PoolingHttpClientConnectionManager.java:114)
--
jay vyas
-https://wiki.apache.org/incubator/IgniteProposal has I think been updated
recently and has a good comparison.
- Although grid gain has been around since the spark days, Apache Ignite is
quite new and just getting started I think so
- you will probably want to reach out to the developers
!
--
jay vyas
Ah, nevermind, I just saw
http://spark.apache.org/docs/1.2.0/sql-programming-guide.html (language
integrated queries) which looks quite similar to what i was thinking
about. I'll give that a whirl...
On Wed, Feb 11, 2015 at 7:40 PM, jay vyas jayunit100.apa...@gmail.com
wrote:
Hi spark
).by(product,meta=product.id=meta.id). toSchemaRDD ?
I know the above snippet is totally wacky but, you get the idea :)
--
jay vyas
just for
dealing with time stamps.
Whats the simplest and cleanest way to map non-spark time values into
SparkSQL friendly time values?
- One option could be a custom SparkSQL type, i guess?
- Any plan to have native spark sql support for Joda Time or (yikes)
java.util.Calendar ?
--
jay vyas
Its a very valid idea indeed, but... It's a tricky subject since the entire
ASF is run on mailing lists , hence there are so many different but equally
sound ways of looking at this idea, which conflict with one another.
On Jan 21, 2015, at 7:03 AM, btiernay btier...@hotmail.com wrote:
I
I find importing a working SBT project into IntelliJ is the way to go.
How did you load the project into intellij?
On Jan 13, 2015, at 4:45 PM, Enno Shioji eshi...@gmail.com wrote:
Had the same issue. I can't remember what the issue was but this works:
libraryDependencies ++= {
it just process them?
Asim
--
jay vyas
https://github.com/jayunit100/SparkStreamingCassandraDemo
On this note, I've built a framework which is mostly pure so that functional
unit tests can be run composing mock data for Twitter statuses, with just
regular junit... That might be relevant also.
I think at some point we should come
Here's an example of a Cassandra etl that you can follow which should exit on
its own. I'm using it as a blueprint for revolving spark streaming apps on top
of.
For me, I kill the streaming app w system.exit after a sufficient amount of
data is collected.
That seems to work for most any
if one can point an example library and how to run it :)
Thanks
--
jay vyas
This seems pretty standard: your IntelliJ classpath isn't matched to the
correct ones that are used in spark shell
Are you using the SBT plugin? If not how are you putting deps into IntelliJ?
On Nov 20, 2014, at 7:35 PM, Sanjay Subramanian
sanjaysubraman...@yahoo.com.INVALID wrote:
that only after all the batch data was in?
Thanks
--
jay vyas
...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
jay vyas
Yup , very important that n1 for spark streaming jobs, If local use
local[2]
The thing to remember is that your spark receiver will take a thread to itself
and produce data , so u need another thread to consume it .
In a cluster manager like yarn or mesos, the word thread Is not used
A use case would be helpful?
Batches of RDDs from Streams are going to have temporal ordering in terms of
when they are processed in a typical application ... , but maybe you could
shuffle the way batch iterations work
On Nov 3, 2014, at 11:59 AM, Josh J joshjd...@gmail.com wrote:
When
of filtering the data collection
times
the #buckets?
thanks, Gerard.
--
jay vyas
-mail: user-h...@spark.apache.org
--
jay vyas
Hi Spark ! I found out why my RDD's werent coming through in my spark
stream.
It turns out you need the onStart() needs to return , it seems - i.e. you
need to launch the worker part of your
start process in a thread. For example
def onStartMock():Unit ={
val future = new Thread(new
, Oct 21, 2014 at 11:02 AM, jay vyas jayunit100.apa...@gmail.com
wrote:
Hi Spark ! I found out why my RDD's werent coming through in my spark
stream.
It turns out you need the onStart() needs to return , it seems - i.e. you
need to launch the worker part of your
start process in a thread
be
very
useful.
--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
--
jay vyas
Hi spark !
I dont quite yet understand the semantics of RDDs in a streaming context
very well yet.
Are there any examples of how to implement CustomInputDStreams, with
corresponding Receivers in the docs ?
Ive hacked together a custom stream, which is being opened and is
consuming data
=pyStreamingSparkRDDPipe”)
data = [1, 2, 3, 4, 5]
rdd = sc.parallelize(data)
def echo(data):
print python recieved: %s % (data) # output winds up in the shell
console in my cluster (ie. The machine I launched pyspark from)
rdd.foreach(echo)
print we are done
--
jay vyas
AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegm...@velos.io W: www.velos.io
--
jay vyas
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Testing-JUnit-with-Spark-tp10861.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
jay vyas
is
for this , possibly I could lend a hand if there are any loose ends needing
to be tied.
--
jay vyas
the standard out from the process as its
output (i assume that is the most common implementation)?
Incidentally, I have not been able to use the pipe command to run an
external process yet, so any hints on that would be appreciated.
--
jay vyas
, this is essentially an implementation of something analgous to hadoop's
streaming api.
On Sun, Jul 20, 2014 at 4:09 PM, jay vyas jayunit100.apa...@gmail.com
wrote:
According to the api docs for the pipe operator,
def pipe(command: String): RDD
http://spark.apache.org/docs/1.0.0/api/scala/org/apache
I think I know what is happening to you. I've looked some into this just this
week, and so its fresh in my brain :) hope this helps.
When no workers are known to the master, iirc, you get this message.
I think this is how it works.
1) You start your master
2) You start a slave, and give it
the slaves can be ephemeral. Since the
master is fixed, though, a new slave can reconnect at any time.
On Mon, Jul 14, 2014 at 10:01 PM, jay vyas jayunit100.apa...@gmail.com
wrote:
Hi spark !
What is the purpose of the randomly assigned SPARK_WORKER_PORT
from the documentation it sais
please just point
me to the right documentation if im mising something obvious :)
thanks !
--
jay vyas
41 matches
Mail list logo