problem extracting map from json

2016-07-07 Thread Michal Vince
Hi guys I`m trying to extract Map[String, Any] from json string, this works well in any scala repl I tried, both scala 2.11 and 2.10 and using both json4s and liftweb-json libraries, but if I try to do the same thing in spark-shell I`m always getting |No information known about type...|

Re: groupBy and store in parquet

2016-05-12 Thread Michal Vince
for each type, and convert those to DF. You only convert to DF for events of the same type, so you avoid the NULLs. Xinh On Thu, May 5, 2016 at 2:52 AM, Michal Vince <vince.mic...@gmail.com <mailto:vince.mic...@gmail.com>> wrote: Hi Xinh For (1) the biggest problem are th

Re: groupBy and store in parquet

2016-05-05 Thread Michal Vince
parate the different event types upstream, like on different Kafka topics, and then process them separately? Xinh On Wed, May 4, 2016 at 7:47 AM, Michal Vince <vince.mic...@gmail.com <mailto:vince.mic...@gmail.com>> wrote: Hi guys I`m trying to store kafka stream

groupBy and store in parquet

2016-05-04 Thread Michal Vince
Hi guys I`m trying to store kafka stream with ~5k events/s as efficiently as possible in parquet format to hdfs. I can`t make any changes to kafka (belongs to 3rd party) Events in kafka are in json format, but the problem is there are many different event types (from different subsystems

1.6.0 spark.sql datetime conversion problem

2016-03-04 Thread Michal Vince
Hi guys I`m using spark 1.6.0 and I`m not sure if I found a bug or I`m doing something wrong I`m playing with dataframes and I`m converting iso 8601 with millis to my timezone - which is Europe/Bratislava with fromt_utc_timestamp function from spark.sql.functions the problem is that