yes, it can recover on a different node. it uses write-ahead-log,
checkpoints offsets of both ingress and egress (e.g. using zookeeper and/or
kafka), replies on the streaming engine's deterministic operations.
by replaying back a certain range of data based on checkpointed
ingress offset (at
From: Ratika Prasad
Sent: Monday, October 05, 2015 2:39 PM
To: u...@spark.apache.org
Cc: Ameeta Jayarajan
Subject: Spark error while running in spark mode
Hi,
When we run our spark component in cluster mode as below we get the following
error
./bin/spark-submit
Hello everyone.
It seems pyspark dataframe read is broken for reading multiple files.
sql.read.json( "file1,file2") fails with java.io.IOException: No input
paths specified in job.
This used to work in spark 1.4 and also still work with sc.textFile
Blaž
Consider the following 2 scenarios:
*Scenario #1*
val pagecounts = sc.textFile("data/pagecounts")
pagecounts.checkpoint
pagecounts.count
*Scenario #2*
val pagecounts = sc.textFile("data/pagecounts")
pagecounts.count
The total time show in the Spark shell Application UI was different for both
Hi Spark Devs,
So this has been brought up a few times before, and generally on the user
list people get directed to use spark-testing-base. I'd like to start
moving some of spark-testing-base's functionality into Spark so that people
don't need a library to do what is (hopefully :p) a very
i ran into the same thing in scala api. we depend heavily on comma
separated paths, and it no longer works.
On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl wrote:
> Hello everyone.
>
> It seems pyspark dataframe read is broken for reading multiple files.
>
> sql.read.json(
I think the problem is that comma is actually a legitimate character for
file name, and as a result ...
On Tuesday, October 6, 2015, Josh Rosen wrote:
> Could someone please file a JIRA to track this?
> https://issues.apache.org/jira/browse/SPARK
>
> On Tue, Oct 6, 2015 at
i personally find the comma separated paths feature much more important
than commas in paths (which one could argue you should avoid).
but assuming people want to keep commas as legitimate characters in paths:
https://issues.apache.org/jira/browse/SPARK-10185
Davies,
that seemed to be my issue, my colleague helped me to resolved it. The
problem was that we build RDD and corresponding StructType by
ourselves (no json, parquet, cassandra, etc - we take a list of business
objects and convert them to Rows, then infer struct type) and I missed one
thing.
Could someone please file a JIRA to track this?
https://issues.apache.org/jira/browse/SPARK
On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers wrote:
> i ran into the same thing in scala api. we depend heavily on comma
> separated paths, and it no longer works.
>
>
> On Tue, Oct
Unfortunately, there is not an obvious way to do this. I am guessing that
you want to partition your stream such that the same keys always go to the
same executor, right?
You could do it by writing a custom RDD. See ShuffledRDD
The current implementation of multiple count distinct in a single query is
very inferior in terms of performance and robustness, and it is also hard
to guarantee correctness of the implementation in some of the refactorings
for Tungsten. Supporting a better version of it is possible in the future,
To provide more context, if we do remove this feature, the following SQL
query would throw an AnalysisException:
select count(distinct colA), count(distinct colB) from foo;
The following should still work:
select count(distinct colA) from foo;
The following should also work:
select
Hey Holden,
It would be helpful if you could outline the set of features you'd imagine
being part of Spark in a short doc. I didn't see a README on the existing
repo, so it's hard to know exactly what is being proposed.
As a general point of process, we've typically avoided merging modules into
I'll put together a google doc and send that out (in the meantime a quick
guide of sort of how the current package can be used is in the blog post I
did at
http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/
) If people think its better to keep as a
User defined functions written in R are not supposed yet. You can implement
your UDF in Scala, register it in sqlContext and use it in SparkR, provided
that you share your context between R and Scala.
--Hossein
On Friday, October 2, 2015, Renyi Xiong wrote:
> Hi Shiva,
>
Anyone knows about this ? TD ?
-yogesh
> On 30-Sep-2015, at 1:25 pm, Yogs wrote:
>
> Hi,
>
> We intend to run adhoc windowed continuous queries on spark streaming data.
> The queries could be registered/deregistered dynamically or can be submitted
> through
17 matches
Mail list logo