The code below was introduced by SPARK-7673 / PR #6225
See item #1 in the description of the PR.
Cheers
On Sat, Oct 24, 2015 at 12:59 AM, Koert Kuipers wrote:
> the code that seems to flatMap directories to all the files inside is in
> the private
Hi,
I have raised a JIRA ( https://issues.apache.org/jira/browse/SPARK-11045)
to track the discussion but also mailing user group .
This Kafka consumer is around for a while in spark-packages (
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer ) and I see
many started using it , I
Hi
I am using spark streaming in Java. One of the problems I have is I need to
save twitter status in JSON format as I receive them
When I run the following code on my local machine. It work how ever all the
output files are created in the current directory of the driver program.
Clearly not a
I wanted to understand something about the internals of spark streaming
executions.
If I have a stream X, and in my program I send stream X to function A and
function B:
1. In function A, I do a few transform/filter operations etc. on X->Y->Z to
create stream Z. Now I do a forEach Operation on Z
If you execute the collect step (foreach in 1, possibly reduce in 2) in two
threads in the driver then both of them will be executed in parallel.
Whichever gets submitted to Spark first gets executed first - you can use a
semaphore if you need to ensure the ordering of execution, though I would
Hi All,
In sql say for example I have table1 (moveid) and table2 (movieid,moviename)
in sql we write something like select moviename ,movieid,count(1) from
table2 inner join table table1 on table1.movieid=table2.moveid group by
, here in sql table1 has only one column where as table 2 has
On 24 Oct 2015, at 00:46, Lin Zhao >
wrote:
I have a spark on YARN deployed using Cloudera Manager 5.4. The installation
went smoothly. But when I try to run spark-shell I get a long list of
exceptions saying "failed to bind to: /public_ip_of_host:0"
I specified spark,cores.max = 4
but it started 2 executors with 2 cores each on each of the 2 workers.
in standalone cluster mode, though we can specify Worker cores, there is no
ways to specify Number of cores executor must take on that particular
worker machine.
On Sat, Oct 24, 2015 at 1:41
the code that seems to flatMap directories to all the files inside is in
the private HadoopFsRelation.buildScan:
// First assumes `input` is a directory path, and tries to get all
files contained in it.
fileStatusCache.leafDirToChildrenFiles.getOrElse(
path,
//
How many rows are you joining? How many rows in the output?
Regards
Sab
On 24-Oct-2015 2:32 am, "pratik khadloya" wrote:
> Actually the groupBy is not taking a lot of time.
> The join that i do later takes the most (95 %) amount of time.
> Also, the grouping i am doing is
Hi All,
I am trying to wrote an RDD as Sequence file into my Hadoop cluster but
getting connection time out again and again ,I can ping the hadoop cluster
and also directory gets created with the file name i specify ,I believe I am
missing some configuration ,Kindly help me
object
better wiki entry https://wiki.apache.org/hadoop/BindException
12 matches
Mail list logo