Re: how to dynamic partition dataframe

2017-01-19 Thread Michal Šenkýř
Hi, You can pass Seqs as varargs in Scala using this syntax: df.partitionBy(seq: _*) Michal On 18.1.2017 03:23, lk_spark wrote: hi,all: I want partition data by reading a config file who tells me how to partition current input data. DataFrameWriter have a method named with :

Re: What is missing here to use sql in spark?

2017-01-01 Thread Michal Šenkýř
: sqlContext.sql(...).show() You can also assign it to a variable or register it as a new table (view) to work with it further: df2 = sqlContext.sql(...) or: sqlContext.sql(...).createOrReplaceTempView("flight201601_carriers") Regards, Michal Šenkýř On 2.1.2017 05:22, Raymond Xie wrote:

Re: Dataset encoders for further types?

2016-12-17 Thread Michal Šenkýř
I actually already made a pull request adding support for arbitrary sequence types. https://github.com/apache/spark/pull/16240 There is still a little problem of Seq.toDS not working for those types (couldn't get implicits with multiple type parameters to resolve correctly) but createDataset

Accessing classpath resources in Spark Shell

2016-12-07 Thread Michal Šenkýř
Hello everyone, I recently encountered a situation where I needed to add a custom classpath resource to my driver and access it from an included library (specifically a configuration file for a custom Dataframe Reader). I need to use it from both inside an application which I submit to the

Re: How to convert a unix timestamp column into date format(yyyy-MM-dd) ?

2016-12-05 Thread Michal Šenkýř
Yet another approach: scala> val df1 = df.selectExpr("client_id", "from_unixtime(ts/1000,'-MM-dd') as ts") Mgr. Michal Šenkýř mike.sen...@gmail.com +420 605 071 818 On 5.12.2016 09:22, Deepak Sharma wrote: Another simpler approach will be: scala> val find

Re: Can spark support exactly once based kafka ? Due to these following question?

2016-12-04 Thread Michal Šenkýř
Hello John, 1. If a task complete the operation, it will notify driver. The driver may not receive the message due to the network, and think the task is still running. Then the child stage won't be scheduled ? Spark's fault tolerance policy is, if there is a problem in processing a task or

Re: RDD flatmap to multiple key/value pairs

2016-12-02 Thread Michal Šenkýř
Hello im281, The transformations equivalent to the first mapper would look like this in Java: .flatMap(line -> Arrays.asList(line.split(" ")).iterator()) .filter(word -> Character.isUpperCase(word.charAt(0))) .mapToPair(word -> new Tuple2<>(word, 1)) The second mapper would look more

Re: Spark 2.x Pyspark Spark SQL createDataframe Error

2016-12-01 Thread Michal Šenkýř
application and passed around. I also have no idea why the behavior changed between Spark 1.6 and Spark 2.0. Michal Šenkýř On Thu, Dec 1, 2016, 18:33 Vinayak Joshi5 <vijos...@in.ibm.com> wrote: > This is the error received: > > > 16/12/01 22:35:36 ERROR Schema: Failed ini