Hi Michal,
yes, it is there logged twice, it can be seen in attached log in one of
previous post with more details:
15/09/17 23:06:37 INFO StreamingContext: Invoking
stop(stopGracefully=false) from shutdown hook
15/09/17 23:06:37 INFO StreamingContext: Invoking
stop(stopGracefully=false) from
I think generally the way forward would be to put aggregate statistics to
an external storage (eg hbase) - it should not have that much influence on
latency. You will probably need it anyway if you need to store historical
information. Wrt to deltas - always a tricky topic. You may want to work
Got it..thnx Reynold..
On 20 Sep 2015 07:08, "Reynold Xin" wrote:
> The RDDs themselves are not materialized, but the implementations can
> materialize.
>
> E.g. in cogroup (which is used by RDD.join), it materializes all the data
> during grouping.
>
> In SQL/DataFrame
Hi Thuy,
You can check Rdd.lookup(). It requires the rdd is partitioned, and of
course, cached in memory. Or you may consider a distributed cache like
ehcache, aws elastic cache.
I think an external storage is an option, too. Especially nosql databases,
they can handle updates at high speed, at
val topics="first"
shouldn't it be val topics = Set("first") ?
On Sun, Sep 20, 2015 at 1:01 PM, Petr Novak wrote:
> val topics="first"
>
> shouldn't it be val topics = Set("first") ?
>
> On Sat, Sep 19, 2015 at 10:07 PM, kali.tumm...@gmail.com <
> kali.tumm...@gmail.com>
We do - but I don't think it is feasible to duplicate every single
algorithm in DF and in RDD.
The only way for this to work is to make one underlying implementation work
for both. Right now DataFrame knows how to serialize individual elements
well and can manage memory that way -- the RDD API
When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a
problem and I did not know why.It signs that:
NOTE: The sbt/sbt script has been relocated to build/sbt.
Please update references to point to the new location.
Invoking 'build/sbt assembly' now ...
Using
Have you seen this thread:
http://search-hadoop.com/m/q3RTtVJJ3I15OJ251
Cheers
On Sun, Sep 20, 2015 at 6:11 PM, Aaroncq4 <475715...@qq.com> wrote:
> When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a
> problem and I did not know why.It signs that:
>
> NOTE: The sbt/sbt
Hi,
If your input format is user -> comment, then you could:
val comments = sc.parallelize(List(("u1", "one two one"), ("u2", "three
four three")))
val wordCounts = comments.
flatMap({case (user, comment) =>
for (word <- comment.split(" ")) yield(((user, word), 1)) }).
Having to restructure my queries isn't a very satisfactory solution,
unfortunately.
I did notice that if I implement the CatalystScan interface instead, then
the filters DO get passed in, but the column identifiers would need to be
translated somewhat to be usable, so that's another option.
why dont we want these (broadcast join and sort-merge join) in DataFrame
but not in RDD?
they dont seem specific to structured data analysis to me.
On Sun, Sep 20, 2015 at 2:41 AM, Rishitesh Mishra
wrote:
> Got it..thnx Reynold..
> On 20 Sep 2015 07:08, "Reynold Xin"
sorry that was a typo. i meant to say:
why do we have these features (broadcast join and sort-merge join) in
DataFrame but not in RDD?
they don't seem specific to structured data analysis to me.
thanks! koert
On Sun, Sep 20, 2015 at 2:46 PM, Koert Kuipers wrote:
> why dont
12 matches
Mail list logo