Re: Incorrect CAST to TIMESTAMP in Hive compatibility

2017-06-05 Thread Anton Okolnychyi
Hi, I also noticed this issue. Actually, it was already mentioned several times. There is an existing JIRA(SPARK-17914). I am going to submit a PR to fix this in a few days. Best, Anton On Jun 5, 2017 21:42, "verbamour" wrote: > Greetings, > > I am using Hive

Re: Aggregator mutate b1 in place in merge

2017-01-29 Thread Anton Okolnychyi
Hi, I recently extended the Spark SQL programming guide to cover user-defined aggregations, where I modified existing variables and returned them back in reduce and merge. This approach worked and it was approved by people who know the context. Hope that helps. 2017-01-29 17:17 GMT+01:00 Koert

Re: Spark Aggregator for array of doubles

2017-01-04 Thread Anton Okolnychyi
Hi, take a look at this pull request that is not merged yet: https://github.com/apache/spark/pull/16329 . It contains examples in Java and Scala that can be helpful. Best regards, Anton Okolnychyi On Jan 4, 2017 23:23, "Anil Langote" <anillangote0...@gmail.com> wrote: > Hi A

Re: Spark Streaming with Kafka

2016-12-12 Thread Anton Okolnychyi
gt; > KafkaUtils.createDirectStream. > > 2) If you assign the same group id to several consumer instances then all > > the consumers will get different set of messages on the same topic. This > is > > a kind of load balancing which kafka provides with its Consumer API. &

Re: Spark Streaming with Kafka

2016-12-11 Thread Anton Okolnychyi
/spark.apache.org/docs/ > latest/streaming-kafka-0-10-integration.html > > AFAIK, yes, you should use unique group id for each stream (KAFKA 0.10 !!!) > >> kafkaParams.put("group.id", "use_a_separate_group_id_for_each_stream"); >> >> > > On Sun, Dec 1

Spark Streaming with Kafka

2016-12-11 Thread Anton Okolnychyi
Hi, I am experimenting with Spark Streaming and Kafka. I will appreciate if someone can say whether the following assumption is correct. If I have multiple computations (each with its own output) on one stream (created as KafkaUtils.createDirectStream), then there is a chance to have

Re: Dataframe broadcast join hint not working

2016-11-26 Thread Anton Okolnychyi
Hi guys, I also experienced a situation when Spark 1.6.2 ignored my hint to do a broadcast join (i.e. broadcast(df)) with a small dataset. However, this happened only in 1 of 3 cases. Setting the "spark.sql.autoBroadcastJoinThreshold" property did not have any impact as well. All 3 cases work

Fwd:

2016-11-15 Thread Anton Okolnychyi
Hi, I have experienced a problem using the Datasets API in Spark 1.6, while almost identical code works fine in Spark 2.0. The problem is related to encoders and custom aggregators. *Spark 1.6 (the aggregation produces an empty map):* implicit val intStringMapEncoder: Encoder[Map[Int,

Expression Encoder for Map[Int, String] in a custom Aggregator on a Dataset

2016-10-20 Thread Anton Okolnychyi
correct result. (33, Map(1 -> 1, 2 -> 2)) Any ideas/suggestions are more than welcome. Sincerely, Anton Okolnychyi

Re: Re: how to select first 50 value of each group after group by?

2016-07-07 Thread Anton Okolnychyi
I tried rank API, however this is not the API I want , because there > are some values have same pv are ranked as same values. And first 50 rows > of each frame is what I'm expecting. the attached file shows what I got by > using rank. > Thank you anyway, I learnt what rank could pro

Re: how to select first 50 value of each group after group by?

2016-07-06 Thread Anton Okolnychyi
The following resources should be useful: https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-windows.html The last link should have the exact solution 2016-07-06 16:55 GMT+02:00 Tal

Last() Window Function

2016-06-27 Thread Anton Okolnychyi
in the documentation where this behavior is described? If no, would it be appropriate from my side to try to find where this can be done? - Would it be appropriate/useful to add some window function examples to spark/examples? There are no such so far Sincerely, Anton Okolnychyi