intrinsic reasons for
this to be impossible?
Sorry again for the giant mail, and thanks for any insights!
Andras
--
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com
comment?
Sent from Windows Mail
*From:* Dean Wampler deanwamp...@gmail.com
*Sent:* Thursday, April 10, 2014 7:39 AM
*To:* Spark Users user@spark.apache.org
*Cc:* Daniel Darabos daniel.dara...@lynxanalytics.com, Andras
Barjakandras.bar...@lynxanalytics.com
Spark has been endorsed by Cloudera
and Distributed Systems
Shanghai Jiao Tong University
Email: yanzhe...@gmail.com
Sent with Sparrow http://www.sparrowmailapp.com/?sig
--
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com
, 2014 at 2:38 PM, Jaonary Rabarisoa jaon...@gmail.comwrote:
Hi all,
I'm just wondering if hybrid GPU/CPU computation is something that is
feasible with spark ? And what should be the best way to do it.
Cheers,
Jaonary
--
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http
of this?
Thanks,
Dave
--
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com
I meant to post this last week, but this is a talk I gave at the Philly ETE
conf. last week:
http://www.slideshare.net/deanwampler/spark-the-next-top-compute-model
Also here:
http://polyglotprogramming.com/papers/Spark-TheNextTopComputeModel.pdf
dean
--
Dean Wampler, Ph.D.
Typesafe
/GCS). Why configure Hadoop if you don't
have to.
On Thu, May 1, 2014 at 12:25 AM, Dean Wampler deanwamp...@gmail.comwrote:
I meant to post this last week, but this is a talk I gave at the Philly
ETE conf. last week:
http://www.slideshare.net/deanwampler/spark-the-next-top-compute-model
have to.
On Thu, May 1, 2014 at 12:25 AM, Dean Wampler deanwamp...@gmail.comwrote:
I meant to post this last week, but this is a talk I gave at the Philly
ETE conf. last week:
http://www.slideshare.net/deanwampler/spark-the-next-top-compute-model
Also here:
http
in context: Spark
Traininghttp://apache-spark-user-list.1001560.n3.nabble.com/Spark-Training-tp5166.html
Sent from the Apache Spark User List mailing list
archivehttp://apache-spark-user-list.1001560.n3.nabble.com/at
Nabble.com.
--
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http
scratch the surface - check out the release notes here:
http://spark.apache.org/releases/spark-release-1-0-0.html
Note that since release artifacts were posted recently, certain
mirrors may not have working downloads for a few hours.
- Patrick
--
Dean Wampler, Ph.D.
Typesafe
@deanwampler
in the Hadoop ecosystem. I think Dataflows is
more than that but yeah that seems to be some of the 'language'. It is
similar in that it is a distributed collection abstraction.
--
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com
:
http://apache-spark-user-list.1001560.n3.nabble.com/Recommended-pipeline-automation-tool-Oozie-tp9319.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com
It looked like you were running in standalone mode (master set to
local[4]). That's how I ran it.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
Can you post your whole SBT build file(s)?
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Wed, Sep 10, 2014 at 6:48
Sorry, I meant any *other* SBT files.
However, what happens if you remove the line:
exclude(org.eclipse.jetty.orbit, javax.servlet)
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
.
https://spark.apache.org/docs/latest/running-on-mesos.html
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Mon, Sep 22
(the unchanged parts) to make efficient copies.
Also, Scala Vector isn't designed to represent sparse vectors.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com
You are creating a HiveContext, then using the sql method instead of hql.
Is that deliberate?
The code doesn't work if you replace HiveContext with SQLContext. Lots of
exceptions are thrown, but I don't have time to investigate now.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd
archive
http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.
--
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
Any particular reason you're not just downloading a build from
http://spark.apache.org/downloads.html Even if you aren't using Hadoop, any
of those builds will work.
If you want to build from source, the Maven build is more reliable.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd
aDstream.transform(_.distinct()) will only make the elements of each RDD
in the DStream distinct, not for the whole DStream globally. Is that what
you're seeing?
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http
+ ... (2 + (2 + (2 + 0 + p_1) + p_2) + p_3) ...)
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Sun, Mar 22, 2015
Both spark-submit and spark-shell have a --jars option for passing
additional jars to the cluster. They will be added to the appropriate
classpaths.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
the first element
(being careful that the partition isn't empty!) and then determine which of
those first lines has the header info.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http
For the Spark SQL parts, 1.3 breaks backwards compatibility, because before
1.3, Spark SQL was considered experimental where API changes were allowed.
So, H2O and ADA compatible with 1.2.X might not work with 1.3.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http
,
but that shouldn't be this issue.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Wed, Mar 25, 2015 at 12:09 PM, roni
:51849/),
Path(/user/MapOutputTracker)]
It's trying to connect to an Akka actor on itself, using the loopback
address.
Try changing SPARK_LOCAL_IP to the publicly routable IP address.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do
this is such a common problem, I usually define a parse method
that converts input text to the desired schema. It catches parse exceptions
like this and reports the bad line at least. If you can return a default
long in this case, say 0, that makes it easier to return something.
dean
Dean Wampler, Ph.D.
Author
Yes, that's the problem. The RDD class exists in both binary jar files, but
the signatures probably don't match. The bottom line, as always for tools
like this, is that you can't mix versions.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073
You can use the coalesce method to reduce the number of partitions. You can
reduce to one if the data is not too big. Then write the output.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
In 1.2 it's a member of SchemaRDD and it becomes available on RDD (through
the type class mechanism) when you add a SQLContext, like so.
val sqlContext = new SQLContext(sc)import sqlContext._
In 1.3, the method has moved to the new DataFrame type.
Dean Wampler, Ph.D.
Author: Programming Scala
needed to satisfy the limit. In this case, it will
trivially stop at the first.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
them as the same user. Or look at what the EC2 scripts do.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Wed, Mar 25
. Actually only
one would be enough, but the default number of partitions will be used. I
believe 8 is the default for Mesos. For local mode (local[*]), it's the
number of cores. You can also set the propoerty spark.default.parallelism.
HTH,
Dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd
closures passed to Spark methods, but that's
probably not what you want.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
SQL keyword.
HTH,
Dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Mon, Mar 23, 2015 at 11:42 AM, nishitd nishitde
It failed to find the class class org.apache.spark.sql.catalyst.ScalaReflection
in the Spark SQL library. Make sure it's in the classpath and the version
is correct, too.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly
I have a self-study workshop here:
https://github.com/deanwampler/spark-workshop
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
to the left of the = pattern matches
on the input tuples.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Thu, Apr 2, 2015 at 10
That appears to work, with a few changes to get the types correct:
input.distinct().combineByKey((s: String) = 1, (agg: Int, s: String) =
agg + 1, (agg1: Int, agg2: Int) = agg1 + agg2)
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073
of the
second (each CompactBuffer). An alternative pattern match syntax would be.
scala val i2 = i1.map { case (key, buffer) = (key, buffer.size) }
This should work as long as none of the CompactBuffers are too large, which
could happen for extremely large data sets.
dean
Dean Wampler, Ph.D.
Author
Without the rest of your code it's hard to make sense of errors. Why do you
need to use reflection?
​Make sure you use the same Scala versions throughout and 2.10.4 is
recommended. That's still the official version for Spark, even though
provisional​ support for 2.11 exists.
Dean Wampler, Ph.D
and interoperating with external processes. Perhaps Java has
something similar these days?
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
set, if necessary.
HOWEVER, it actually returns a CompactBuffer.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L444
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly
It's mostly manual. You could try automating with something like Chef, of
course, but there's nothing already available in terms of automation.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
The convention for standalone cluster is to use Zookeeper to manage master
failover.
http://spark.apache.org/docs/latest/spark-standalone.html
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Wed, Apr 29, 2015 at 6:19 AM, Anshul Singhle ans...@betaglide.com
wrote
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Wed, Apr 29, 2015 at 6:25 AM, selim namsi selim.na...@gmail.com wrote
I would use the ps command on each machine while the job is running to
confirm that every process involved is running as root.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http
Are the tasks on the slaves also running as root? If not, that might
explain the problem.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
= Pcaps.findAllDevs();
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Mon, Apr 27, 2015 at 4:03 AM, Hai Shan Wu wuh
, i do not see it.
On Sun, May 3, 2015 at 9:15 PM, Dean Wampler deanwamp...@gmail.com
wrote:
IMHO, you are trying waaay to hard to optimize work on what is really a
small data set. 25G, even 250G, is not that much data, especially if you've
spent a month trying to get something to work
last week I wrote one that used a
hash map to track the latest timestamps seen for specific keys.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
://spark.apache.org/docs/latest/sql-programming-guide.html#performance-tuning
and this talk by Michael Armbrust for example,
http://spark-summit.org/wp-content/uploads/2014/07/Performing-Advanced-Analytics-on-Relational-Data-with-Spark-SQL-Michael-Armbrust.pdf.
dean
Dean Wampler, Ph.D.
Author: Programming
all the optimizations: Kryo,
partitionBy, etc. Just use the simplest code you can. Make it work first.
Then, if it really isn't fast enough, look for actual evidence of
bottlenecks and optimize those.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product
Note that each JSON object has to be on a single line in the files.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
It's the import statement Olivier showed that makes the method available.
Note that you can also use `sc.createDataFrame(myRDD)`, without the need
for the import statement. I personally prefer this approach.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com
) 21).show()
I tested and both the $column and df(column) syntax works, but I'm
wondering which is *preferred*. Is one the original and one a new
feature we should be using?
Thanks,
Diana
(Spark Curriculum Developer for Cloudera)
--
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd
If you're running Hadoop, too, now that Hortonworks supports Spark, you
might be able to use their distribution.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com
.
Whatever you can do to make this work like table scans and joins will
probably be most efficient.
dean
On 7 April 2015 at 03:33, Dean Wampler deanwamp...@gmail.com wrote:
The log instance won't be serializable, because it will have a file
handle to write to. Try defining another static method
connection, same problem.
You can't suppress the warning because it's actually an error. The
VoidFunction can't be serialized to send it over the cluster's network.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http
The runtime attempts to serialize everything required by records, and also
any lambdas/closures you use. Small, simple types are less likely to run
into this problem.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe
Foreach() runs in parallel across the cluster, like map, flatMap, etc.
You'll only run into problems if you call collect(), which brings the
entire RDD into memory in the driver program.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do
the way
to Scala, so all that noisy code shrinks down to simpler expressions.
You'll be surprised how helpful that is for comprehending your code and
reasoning about it.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe
Without the rest of your code, it's hard to know what might be
unserializable.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
spark.apache.org. Even the Hadoop builds there will work
okay, as they don't actually attempt to run Hadoop commands.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com
Use JavaSparkContext.parallelize.
http://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#parallelize(java.util.List)
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http
Are you allocating 1 core per input stream plus additional cores for the
rest of the processing? Each input stream Reader requires a dedicated core.
So, if you have two input streams, you'll need local[3] at least.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com
calling take(1) to grab the first element should also work, even if
the RDD is empty. (It will return an empty RDD in that case, but not throw
an exception.)
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http
You're welcome. Two limitations to know about:
1. I haven't updated it to 1.3
2. It uses Scala for all examples (my bias ;), so less useful if you don't
want to use Scala.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly
to
process it. What's your streaming batch window size?
See also here for ideas:
http://spark.apache.org/docs/1.2.1/streaming-programming-guide.html#performance-tuning
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http
Is it possible tbBER is empty? If so, it shouldn't fail like this, of
course.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
This might be overkill for your needs, but the scodec parser combinator
library might be useful for creating a parser.
https://github.com/scodec/scodec
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http
Use the MVN build instead. From the README in the git repo (
https://github.com/apache/spark)
mvn -DskipTests clean package
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http
running Spark
in Mesos, but accessing data in MapR-FS?
Perhaps the MapR shim library doesn't support Spark 1.3.1.
HTH,
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http
Here's our home page: http://www.meetup.com/Chicago-Spark-Users/
Thanks,
Dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
:
spark.mesos.coarse true
Or, from this page
http://spark.apache.org/docs/latest/running-on-mesos.html, set the
property in a SparkConf object used to construct the SparkContext:
conf.set(spark.mesos.coarse, true)
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http
Most of the 2.11 issues are being resolved in Spark 1.4. For a while, the
Spark project has published maven artifacts that are compiled with 2.11 and
2.10, although the downloads at http://spark.apache.org/downloads.html are
still all for 2.10.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd
at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org javascript:;
For additional commands, e-mail: user-h...@spark.apache.org javascript:;
--
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
If you don't mind using SBT with your Scala instead of Maven, you can see
the example I created here: https://github.com/deanwampler/spark-workshop
It can be loaded into Eclipse or IntelliJ
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073
integer, then do the filtering and final averaging
downstream if you can, i.e., where you actually need the final value. If
you need it on every batch iteration, then you'll have to do a reduce per
iteration.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product
either the master service isn't running or isn't reachable
over your network. Is hadoopm0 publicly routable? Is port 7077 blocked? As
a test, can you telnet to it?
telnet hadoopm0 7077
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do
Did you use an absolute path in $path_to_file? I just tried this with
spark-shell v1.4.1 and it worked for me. If the URL is wrong, you should
see an error message from log4j that it can't find the file. For windows it
would be something like file:/c:/path/to/file, I believe.
Dean Wampler, Ph.D
Typesafe (http://typesafe.com). We provide commercial support for Spark on
Mesos and Mesosphere DCOS. We contribute to Spark's Mesos integration and
Spark Streaming enhancements.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly
That's the correct URL. Recent change? The last time I looked, earlier this
week, it still had the obsolete artifactory URL for URL1 ;)
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler
It should work fine. I have an example script here:
https://github.com/deanwampler/spark-workshop/blob/master/src/main/scala/sparkworkshop/SparkSQLParquet10-script.scala
(Spark 1.4.X)
What does I am failing to do so mean?
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http
Also, Spark on Mesos supports cluster mode:
http://spark.apache.org/docs/latest/running-on-mesos.html#cluster-mode
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com
Following Hadoop conventions, Spark won't overwrite an existing directory.
You need to provide a unique output path every time you run the program, or
delete or rename the target directory before you run the job.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http
. Is that really
a mandatory requirement for this problem?
HTH,
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
Add the other Cassandra dependencies (dse.jar,
spark-cassandra-connect-java_2.10) to your --jars argument on the command
line.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http
org.apache.spark.streaming.twitter.TwitterInputDStream is a small class.
You could write your own that lets you change the filters at run time. Then
provide a mechanism in your app, like periodic polling of a database table
or file for the list of filters.
Dean Wampler, Ph.D.
Author: Programming
So, just before running the job, if you run the HDFS command at a shell
prompt: hdfs dfs -ls hdfs://172.31.42.10:54310/./weblogReadResult.
Does it say the path doesn't exist?
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly
where HelloWorld is found. Confusing, yes it is...
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Mon, Aug 10
of Zookeeper if you need master failover.
Hence, you don't see it often in production scenarios.
The Spark page on cluster deployments has more details:
http://spark.apache.org/docs/latest/cluster-overview.html
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com
of network overhead. In some situations, a high
performance file system appliance, e.g., NAS, could suffice.
My $0.02,
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http
to write output?
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Fri, Jul 24, 2015 at 11:23 AM, Brandon White bwwintheho
is running, then you can use the Spark web UI on that machine to see
what the Spark job is doing.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
, if that works for your every 10-min. need.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Wed, Jul 22, 2015 at 3:53 AM, boci
/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/Partition/Column
for full details.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http
You can certainly start jobs without Chronos, but to automatically restart
finished jobs or to run jobs at specific times or periods, you'll want
something like Chronos.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly
1 - 100 of 123 matches
Mail list logo