Hi,
I also noticed this issue. Actually, it was already mentioned several
times. There is an existing JIRA(SPARK-17914).
I am going to submit a PR to fix this in a few days.
Best,
Anton
On Jun 5, 2017 21:42, "verbamour" wrote:
> Greetings,
>
> I am using Hive
Hi,
I recently extended the Spark SQL programming guide to cover user-defined
aggregations, where I modified existing variables and returned them back in
reduce and merge. This approach worked and it was approved by people who
know the context.
Hope that helps.
2017-01-29 17:17 GMT+01:00 Koert
Hi,
take a look at this pull request that is not merged yet:
https://github.com/apache/spark/pull/16329 . It contains examples in Java
and Scala that can be helpful.
Best regards,
Anton Okolnychyi
On Jan 4, 2017 23:23, "Anil Langote" <anillangote0...@gmail.com> wrote:
> Hi A
gt; > KafkaUtils.createDirectStream.
> > 2) If you assign the same group id to several consumer instances then all
> > the consumers will get different set of messages on the same topic. This
> is
> > a kind of load balancing which kafka provides with its Consumer API.
&
/spark.apache.org/docs/
> latest/streaming-kafka-0-10-integration.html
>
> AFAIK, yes, you should use unique group id for each stream (KAFKA 0.10 !!!)
>
>> kafkaParams.put("group.id", "use_a_separate_group_id_for_each_stream");
>>
>>
>
> On Sun, Dec 1
Hi,
I am experimenting with Spark Streaming and Kafka. I will appreciate if
someone can say whether the following assumption is correct.
If I have multiple computations (each with its own output) on one stream
(created as KafkaUtils.createDirectStream), then there is a chance to have
Hi guys,
I also experienced a situation when Spark 1.6.2 ignored my hint to do a
broadcast join (i.e. broadcast(df)) with a small dataset. However, this
happened only in 1 of 3 cases. Setting the
"spark.sql.autoBroadcastJoinThreshold" property did not have any impact as
well. All 3 cases work
Hi,
I have experienced a problem using the Datasets API in Spark 1.6, while
almost identical code works fine in Spark 2.0.
The problem is related to encoders and custom aggregators.
*Spark 1.6 (the aggregation produces an empty map):*
implicit val intStringMapEncoder: Encoder[Map[Int,
correct result.
(33, Map(1 -> 1, 2 -> 2))
Any ideas/suggestions are more than welcome.
Sincerely,
Anton Okolnychyi
I tried rank API, however this is not the API I want , because there
> are some values have same pv are ranked as same values. And first 50 rows
> of each frame is what I'm expecting. the attached file shows what I got by
> using rank.
> Thank you anyway, I learnt what rank could pro
The following resources should be useful:
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-windows.html
The last link should have the exact solution
2016-07-06 16:55 GMT+02:00 Tal
in the documentation where this behavior is
described? If no, would it be appropriate from my side to try to find where
this can be done?
- Would it be appropriate/useful to add some window function examples to
spark/examples? There are no such so far
Sincerely,
Anton Okolnychyi
12 matches
Mail list logo