Dear Spark developers,
I am working with Spark streaming 1.6.1. The task is to get RDDs for some
external analytics from each timewindow. This external function accepts RDD so
I cannot use DStream. I learned that DStream.window.compute(time) returns
Option[RDD]. I am trying to use it in the
Are there any examples of how to use StateStore with DStreams? It seems
like the idea would be to create a new version with each minibatch, but I
don't quite know how to make that happen. My lame attempt is below.
def run (ss: SparkSession): Unit = {
val c = new
It's basically the output of the explain command.
On Wed, Aug 24, 2016 at 12:31 PM, Maciej Bryński wrote:
> Hi,
> I read this article:
> https://databricks.com/blog/2015/04/13/deep-dive-into-
> spark-sqls-catalyst-optimizer.html
>
> And I have a question. Is it possible to
If you're just varying versions (or things that can be controlled by a
profile, which is most everything including dependencies), you don't
need and probably don't want multiple POM files. Even that wouldn't
mean you can't use classifiers.
I have seen it used for HBase, core Hadoop. I am not sure
Hi,
Do you plan to add tag for this release on github ?
https://github.com/graphframes/graphframes/releases
Regards,
Maciek
2016-08-17 3:18 GMT+02:00 Jacek Laskowski :
> Hi Tim,
>
> AWESOME. Thanks a lot for releasing it. That makes me even more eager
> to see it in Spark's
Have you seen any successful applications of this for Spark 1.x/2.x?
>From the doc "The classifier allows to distinguish artifacts that were
built from the same POM but differ in their content."
We'd be building from different POMs, since we'd be modifying the Spark
dependency version (and
This is also what "classifiers" are for in Maven, to have variations
on one artifact and version. https://maven.apache.org/pom.html
It has been used to ship code for Hadoop 1 vs 2 APIs.
In a way it's the same idea as Scala's "_2.xx" naming convention, with
a less unfortunate implementation.
On
Looks like I'm general people like it. Next step is for somebody to take
the lead and implement it.
Tom do you have cycles to do this?
On Wednesday, August 24, 2016, Tom Graves wrote:
> ping, did this discussion conclude or did we decide what we are doing?
>
> Tom
>
>
>
Ah yes, thank you for the clarification.
On Wed, Aug 24, 2016 at 11:44 AM, Ted Yu wrote:
> 'Spark 1.x and Scala 2.10 & 2.11' was repeated.
>
> I guess your second line should read:
>
> org.bdgenomics.adam:adam-{core,apis,cli}-spark2_2.1[0,1] for Spark 2.x
> and Scala 2.10
'Spark 1.x and Scala 2.10 & 2.11' was repeated.
I guess your second line should read:
org.bdgenomics.adam:adam-{core,apis,cli}-spark2_2.1[0,1] for Spark 2.x and
Scala 2.10 & 2.11
On Wed, Aug 24, 2016 at 9:41 AM, Michael Heuer wrote:
> Hello,
>
> We're a project downstream
Hello,
We're a project downstream of Spark and need to provide separate artifacts
for Spark 1.x and Spark 2.x. Has any convention been established or even
proposed for artifact names and/or qualifiers?
We are currently thinking
org.bdgenomics.adam:adam-{core,apis,cli}_2.1[0,1] for Spark 1.x
FYI, I've updated the issue's description to include a very simple program
which reproduces the issue for me.
Cheers,
Michael
> On Aug 23, 2016, at 4:54 PM, Michael Allman wrote:
>
> I've replied on the issue's page, but in a word, "yes". See
>
ping, did this discussion conclude or did we decide what we are doing?
Tom
On Friday, May 13, 2016 3:19 PM, Michael Armbrust
wrote:
+1 to the general structure of Reynold's proposal. I've found what we do
currently a little confusing. In particular, it
On Wed, Aug 24, 2016 at 2:32 PM, Steve Loughran wrote:
> no reason; the key thing is : not in cluster mode, as there your work happens
> elsewhere
Right! Anything but cluster mode should make it easy (that leaves us
with local).
Jacek
You are saying the RDD lineage must be serialized, otherwise we could not
recreate it after a node failure. This is false. The RDD lineage is not
serialized. It is only relevant to the driver application and as such it is
just kept in memory in the driver application. If the driver application
> On 24 Aug 2016, at 11:38, Jacek Laskowski wrote:
>
> On Wed, Aug 24, 2016 at 11:13 AM, Steve Loughran
> wrote:
>
>> I'd recommend
>
> ...which I mostly agree to with some exceptions :)
>
>> -stark spark standalone from there
>
> Why spark
On Wed, Aug 24, 2016 at 11:13 AM, Steve Loughran wrote:
> I'd recommend
...which I mostly agree to with some exceptions :)
> -stark spark standalone from there
Why spark standalone since the OP asked about "learning how query
execution flow occurs in Spark SQL"? How
On 24 Aug 2016, at 07:10, Nishadi Kirielle
> wrote:
Hi,
I'm engaged in learning how query execution flow occurs in Spark SQL. In order
to understand the query execution flow, I'm attempting to run an example in
debug mode with intellij IDEA. It
can you please elaborate a bit more?
On Wed, Aug 24, 2016 12:41 AM, Sean Owen so...@cloudera.com wrote:
Byte code, no. It's sufficient to store the information that the RDD represents,
which can include serialized function closures, but that's not quite storing
byte code.
On Wed, Aug 24,
Byte code, no. It's sufficient to store the information that the RDD
represents, which can include serialized function closures, but that's not
quite storing byte code.
On Wed, Aug 24, 2016 at 2:00 AM, kant kodali wrote:
> Hi Guys,
>
> I have this question for a very long
Hi,
I'm engaged in learning how query execution flow occurs in Spark SQL. In
order to understand the query execution flow, I'm attempting to run an
example in debug mode with intellij IDEA. It would be great if anyone can
help me with debug configurations.
Thanks & Regards
Nishadi
On Tue, Jun
21 matches
Mail list logo