Re: Thanks For a Job Well Done !!!

2016-06-18 Thread Reynold Xin
Thanks for the kind words, Krishna! Please keep the feedback coming.

On Saturday, June 18, 2016, Krishna Sankar  wrote:

> Hi all,
>Just wanted to thank all for the dataset API - most of the times we see
> only bugs in these lists ;o).
>
>- Putting some context, this weekend I was updating the SQL chapters
>of my book - it had all the ugliness of SchemaRDD,
>registerTempTable, take(10).foreach(println)
>and take(30).foreach(e=>println("%15s | %9.2f |".format(e(0),e(1 ;o)
>- I remember Hossein Falaki chiding me about the ugly println
>   statements !
>   - Took me a little while to grok the dataset, sparksession,
>   
> spark.read.option("header","true").option("inferSchema","true").csv(...) et
>   al.
>  - I am a big R fan and know the language pretty decent - so the
>  constructs are familiar
>   - Once I got it ( I am sure still there are more mysteries to
>uncover ...) it was just beautiful - well done folks !!!
>- One sees the contrast a lot better while teaching or writing books,
>because one has to think thru the old, the new and the transitional arc
>   - I even remember the good old days when we were discussing whether
>   Spark would get the dataframes like R at one of Paco's sessions !
>   - And now, it looks very decent for data wrangling.
>
> Cheers & keep up the good work
> 
> P.S: My next chapter is the MLlib - need to convert to ml. Should be
> interesting ... I am a glutton for punishment - of the Spark kind, of
> course !
>


Thanks For a Job Well Done !!!

2016-06-18 Thread Krishna Sankar
Hi all,
   Just wanted to thank all for the dataset API - most of the times we see
only bugs in these lists ;o).

   - Putting some context, this weekend I was updating the SQL chapters of
   my book - it had all the ugliness of SchemaRDD,
   registerTempTable, take(10).foreach(println)
   and take(30).foreach(e=>println("%15s | %9.2f |".format(e(0),e(1 ;o)
   - I remember Hossein Falaki chiding me about the ugly println statements
  !
  - Took me a little while to grok the dataset, sparksession,
  spark.read.option("header","true").option("inferSchema","true").csv(...)
et
  al.
 - I am a big R fan and know the language pretty decent - so the
 constructs are familiar
  - Once I got it ( I am sure still there are more mysteries to uncover
   ...) it was just beautiful - well done folks !!!
   - One sees the contrast a lot better while teaching or writing books,
   because one has to think thru the old, the new and the transitional arc
  - I even remember the good old days when we were discussing whether
  Spark would get the dataframes like R at one of Paco's sessions !
  - And now, it looks very decent for data wrangling.

Cheers & keep up the good work

P.S: My next chapter is the MLlib - need to convert to ml. Should be
interesting ... I am a glutton for punishment - of the Spark kind, of
course !


Re: Does dataframe write append mode work with text format

2016-06-18 Thread Yash Sharma
Awesome!! will give it a try again.  Thanks!!

- Thanks, via mobile,  excuse brevity.
On Jun 19, 2016 11:32 AM, "Xiao Li"  wrote:

> Hi, Yash,
>
> It should work.
>
>  val df = spark.range(1, 5)
>.select('id + 1 as 'p1, 'id + 2 as 'p2, 'id + 3 as 'p3, 'id + 4 as 'p4,
> 'id + 5 as 'p5, 'id as 'b)
>.selectExpr("p1", "p2", "p3", "p4", "p5", "CAST(b AS STRING) AS
> s").coalesce(1)
>
>  df.write.partitionBy("p1", "p2", "p3", "p4",
> "p5").text(dir.getCanonicalPath)
>  val newDF = spark.read.text(dir.getCanonicalPath)
>  newDF.show()
>
>  df.write.partitionBy("p1", "p2", "p3", "p4", "p5")
>.mode(SaveMode.Append).text(dir.getCanonicalPath)
>  val newDF2 = spark.read.text(dir.getCanonicalPath)
>  newDF2.show()
>
> I tried it. It works well.
>
> Thanks,
>
> Xiao Li
>
> 2016-06-18 8:57 GMT-07:00 Yash Sharma :
>
>> Hi All,
>> I have been using the parquet append mode for write which works just
>> fine. Just wanted to check if the same is supported for plain text format.
>> The below code blows up with error saying the file already exists.
>>
>>
>>
>> {code}
>> userEventsDF.write.mode("append").partitionBy("year", "month",
>> "date").text(outputDir)
>> or,
>> userEventsDF.write.mode("append").partitionBy("year", "month",
>> "date").format("text").save(outputDir)
>> {code}
>>
>
>


Re: Does dataframe write append mode work with text format

2016-06-18 Thread Xiao Li
Hi, Yash,

It should work.

 val df = spark.range(1, 5)
   .select('id + 1 as 'p1, 'id + 2 as 'p2, 'id + 3 as 'p3, 'id + 4 as 'p4,
'id + 5 as 'p5, 'id as 'b)
   .selectExpr("p1", "p2", "p3", "p4", "p5", "CAST(b AS STRING) AS
s").coalesce(1)

 df.write.partitionBy("p1", "p2", "p3", "p4",
"p5").text(dir.getCanonicalPath)
 val newDF = spark.read.text(dir.getCanonicalPath)
 newDF.show()

 df.write.partitionBy("p1", "p2", "p3", "p4", "p5")
   .mode(SaveMode.Append).text(dir.getCanonicalPath)
 val newDF2 = spark.read.text(dir.getCanonicalPath)
 newDF2.show()

I tried it. It works well.

Thanks,

Xiao Li

2016-06-18 8:57 GMT-07:00 Yash Sharma :

> Hi All,
> I have been using the parquet append mode for write which works just
> fine. Just wanted to check if the same is supported for plain text format.
> The below code blows up with error saying the file already exists.
>
>
>
> {code}
> userEventsDF.write.mode("append").partitionBy("year", "month",
> "date").text(outputDir)
> or,
> userEventsDF.write.mode("append").partitionBy("year", "month",
> "date").format("text").save(outputDir)
> {code}
>


Does dataframe write append mode work with text format

2016-06-18 Thread Yash Sharma
Hi All,
I have been using the parquet append mode for write which works just
fine. Just wanted to check if the same is supported for plain text format.
The below code blows up with error saying the file already exists.



{code}
userEventsDF.write.mode("append").partitionBy("year", "month",
"date").text(outputDir)
or,
userEventsDF.write.mode("append").partitionBy("year", "month",
"date").format("text").save(outputDir)
{code}


Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

2016-06-18 Thread Jacek Laskowski
+1

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sat, Jun 18, 2016 at 9:13 AM, Reynold Xin  wrote:
> Looks like that's resolved now.
>
> I will wait till Sunday to cut rc2 to give people more time to find issues
> with rc1.
>
>
> On Fri, Jun 17, 2016 at 10:58 AM, Marcelo Vanzin 
> wrote:
>>
>> -1 (non-binding)
>>
>> SPARK-16017 shows a severe perf regression in YARN compared to 1.6.1.
>>
>> On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin  wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 1.6.2!
>> >
>> > The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes if
>> > a
>> > majority of at least 3+1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 1.6.2
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > The tag to be voted on is v1.6.2-rc1
>> > (4168d9c94a9564f6b3e62f5d669acde13a7c7cf6)
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://home.apache.org/~pwendell/spark-releases/spark-1.6.2-rc1-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1184
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://home.apache.org/~pwendell/spark-releases/spark-1.6.2-rc1-docs/
>> >
>> >
>> > ===
>> > == How can I help test this release? ==
>> > ===
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions from 1.6.1.
>> >
>> > 
>> > == What justifies a -1 vote for this release? ==
>> > 
>> > This is a maintenance release in the 1.6.x series.  Bugs already present
>> > in
>> > 1.6.1, missing features, or bugs related to new features will not
>> > necessarily block this release.
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark 2.0 Dataset Documentation

2016-06-18 Thread Pedro Rodriguez
Going to go ahead and starting working on the docs assuming this gets
merged https://github.com/apache/spark/pull/13592. Opened a JIRA
https://issues.apache.org/jira/browse/SPARK-16046

Having some issues building docs. The Java docs fail to build. Output when
it fails is here:
https://gist.github.com/EntilZha/9c585662ef7cda820c311d1c7eb16e42

This might be causing an issue where loading the API docs fails due to some
javascript errors (doesn't seem to switch page correctly). The main one
repeated several times is: main.js:2 Uncaught SyntaxError: Unexpected token
<

Pedro

On Sat, Jun 18, 2016 at 6:03 AM, Jacek Laskowski  wrote:

> On Sat, Jun 18, 2016 at 6:13 AM, Pedro Rodriguez
>  wrote:
>
> > using Datasets (eg using $ to select columns).
>
> Or even my favourite one - the tick ` :-)
>
> Jacek
>



-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience


Re: Spark 2.0 Dataset Documentation

2016-06-18 Thread Jacek Laskowski
On Sat, Jun 18, 2016 at 6:13 AM, Pedro Rodriguez
 wrote:

> using Datasets (eg using $ to select columns).

Or even my favourite one - the tick ` :-)

Jacek

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

2016-06-18 Thread Reynold Xin
Looks like that's resolved now.

I will wait till Sunday to cut rc2 to give people more time to find issues
with rc1.


On Fri, Jun 17, 2016 at 10:58 AM, Marcelo Vanzin 
wrote:

> -1 (non-binding)
>
> SPARK-16017 shows a severe perf regression in YARN compared to 1.6.1.
>
> On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin  wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 1.6.2!
> >
> > The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes if a
> > majority of at least 3+1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 1.6.2
> > [ ] -1 Do not release this package because ...
> >
> >
> > The tag to be voted on is v1.6.2-rc1
> > (4168d9c94a9564f6b3e62f5d669acde13a7c7cf6)
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://home.apache.org/~pwendell/spark-releases/spark-1.6.2-rc1-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1184
> >
> > The documentation corresponding to this release can be found at:
> > https://home.apache.org/~pwendell/spark-releases/spark-1.6.2-rc1-docs/
> >
> >
> > ===
> > == How can I help test this release? ==
> > ===
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions from 1.6.1.
> >
> > 
> > == What justifies a -1 vote for this release? ==
> > 
> > This is a maintenance release in the 1.6.x series.  Bugs already present
> in
> > 1.6.1, missing features, or bugs related to new features will not
> > necessarily block this release.
> >
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>