import sql.implicits._

2016-10-14 Thread Jakub Dubovsky
Hey community, I would like to *educate* myself about why all *sql implicits* (most notably conversion to Dataset API) are imported from *instance* of SparkSession and not using static imports. Having this design one runs into problems like this

Re: import sql.implicits._

2016-10-14 Thread Koert Kuipers
for example when do you Seq(1,2,3).toDF("a") it needs to get the SparkSession from somewhere. by importing the implicits from spark.implicits._ they have access to a SparkSession for operations like this. On Fri, Oct 14, 2016 at 4:42 PM, Jakub Dubovsky < spark.dubovsky.ja...@gmail.com> wrote: >

Re: import sql.implicits._

2016-10-14 Thread Koert Kuipers
b ​asically the implicit conversiosn that need it are rdd => dataset and seq => dataset​ On Fri, Oct 14, 2016 at 5:47 PM, Koert Kuipers wrote: > for example when do you Seq(1,2,3).toDF("a") it needs to get the > SparkSession from somewhere. by importing the implicits from >

Re: import sql.implicits._

2016-10-14 Thread Koert Kuipers
about the stackoverflow question, do this: def validateAndTransform(df: DataFrame) : DataFrame = { import df.sparkSession.implicits._ ... } On Fri, Oct 14, 2016 at 5:51 PM, Koert Kuipers wrote: > b > ​asically the implicit conversiosn that need it are rdd => dataset

RE: Kafka integration: get existing Kafka messages?

2016-10-14 Thread Haopu Wang
Cody, the link is helpful. But I still have issues in my test. I set "auto.offset.reset" to "earliest" and then create KafkaRDD using OffsetRange which is out of range. According to Kafka's document, I expect to get earliest offset of that partition. But I get below exception and it looks

Re: SparkR execution hang on when handle a RDD which is converted from DataFrame

2016-10-14 Thread Lantao Jin
40GB 2016-10-14 14:20 GMT+08:00 Felix Cheung : > How big is the metrics_moveing_detection_cube table? > > > > > > On Thu, Oct 13, 2016 at 8:51 PM -0700, "Lantao Jin" > wrote: > > sqlContext <- sparkRHive.init(sc) > sqlString<- > "SELECT > key_id,

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-14 Thread Julian Keppel
Okay, thank you! Can you say, when this feature will be released? 2016-10-13 16:29 GMT+02:00 Cody Koeninger : > As Sean said, it's unreleased. If you want to try it out, build spark > > http://spark.apache.org/docs/latest/building-spark.html > > The easiest way to include

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-14 Thread Cody Koeninger
I can't be sure, no. On Fri, Oct 14, 2016 at 3:06 AM, Julian Keppel wrote: > Okay, thank you! Can you say, when this feature will be released? > > 2016-10-13 16:29 GMT+02:00 Cody Koeninger : >> >> As Sean said, it's unreleased. If you want to try

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-14 Thread Cody Koeninger
For you or anyone else having issues with consumer rebalance, what are your settings for heartbeat.interval.ms session.timeout.ms group.max.session.timeout.ms relative to your batch time? On Tue, Oct 11, 2016 at 10:19 AM, static-max wrote: > Hi, > > I run into the

Re: Kafka integration: get existing Kafka messages?

2016-10-14 Thread Cody Koeninger
If you're creating a Kafka RDD as opposed to a dstream, you're explicitly specifying a beginning and ending offset, auto.offset.reset doesn't really have anything to do with it. If you look at that log line, it's trying to read the 2nd message out of the 0th partition of mytopic2, and not able to

Re: spark with kerberos

2016-10-14 Thread Steve Loughran
On 13 Oct 2016, at 10:50, dbolshak > wrote: Hello community, We've a challenge and no ideas how to solve it. The problem, Say we have the following environment: 1. `cluster A`, the cluster does not use kerberos and we use it as a