Re: Thanks For a Job Well Done !!!
Thanks for the kind words, Krishna! Please keep the feedback coming. On Saturday, June 18, 2016, Krishna Sankar wrote: > Hi all, >Just wanted to thank all for the dataset API - most of the times we see > only bugs in these lists ;o). > >- Putting some context, this weekend I was updating the SQL chapters >of my book - it had all the ugliness of SchemaRDD, >registerTempTable, take(10).foreach(println) >and take(30).foreach(e=>println("%15s | %9.2f |".format(e(0),e(1 ;o) >- I remember Hossein Falaki chiding me about the ugly println > statements ! > - Took me a little while to grok the dataset, sparksession, > > spark.read.option("header","true").option("inferSchema","true").csv(...) et > al. > - I am a big R fan and know the language pretty decent - so the > constructs are familiar > - Once I got it ( I am sure still there are more mysteries to >uncover ...) it was just beautiful - well done folks !!! >- One sees the contrast a lot better while teaching or writing books, >because one has to think thru the old, the new and the transitional arc > - I even remember the good old days when we were discussing whether > Spark would get the dataframes like R at one of Paco's sessions ! > - And now, it looks very decent for data wrangling. > > Cheers & keep up the good work > > P.S: My next chapter is the MLlib - need to convert to ml. Should be > interesting ... I am a glutton for punishment - of the Spark kind, of > course ! >
Thanks For a Job Well Done !!!
Hi all, Just wanted to thank all for the dataset API - most of the times we see only bugs in these lists ;o). - Putting some context, this weekend I was updating the SQL chapters of my book - it had all the ugliness of SchemaRDD, registerTempTable, take(10).foreach(println) and take(30).foreach(e=>println("%15s | %9.2f |".format(e(0),e(1 ;o) - I remember Hossein Falaki chiding me about the ugly println statements ! - Took me a little while to grok the dataset, sparksession, spark.read.option("header","true").option("inferSchema","true").csv(...) et al. - I am a big R fan and know the language pretty decent - so the constructs are familiar - Once I got it ( I am sure still there are more mysteries to uncover ...) it was just beautiful - well done folks !!! - One sees the contrast a lot better while teaching or writing books, because one has to think thru the old, the new and the transitional arc - I even remember the good old days when we were discussing whether Spark would get the dataframes like R at one of Paco's sessions ! - And now, it looks very decent for data wrangling. Cheers & keep up the good work P.S: My next chapter is the MLlib - need to convert to ml. Should be interesting ... I am a glutton for punishment - of the Spark kind, of course !
Re: Does dataframe write append mode work with text format
Awesome!! will give it a try again. Thanks!! - Thanks, via mobile, excuse brevity. On Jun 19, 2016 11:32 AM, "Xiao Li" wrote: > Hi, Yash, > > It should work. > > val df = spark.range(1, 5) >.select('id + 1 as 'p1, 'id + 2 as 'p2, 'id + 3 as 'p3, 'id + 4 as 'p4, > 'id + 5 as 'p5, 'id as 'b) >.selectExpr("p1", "p2", "p3", "p4", "p5", "CAST(b AS STRING) AS > s").coalesce(1) > > df.write.partitionBy("p1", "p2", "p3", "p4", > "p5").text(dir.getCanonicalPath) > val newDF = spark.read.text(dir.getCanonicalPath) > newDF.show() > > df.write.partitionBy("p1", "p2", "p3", "p4", "p5") >.mode(SaveMode.Append).text(dir.getCanonicalPath) > val newDF2 = spark.read.text(dir.getCanonicalPath) > newDF2.show() > > I tried it. It works well. > > Thanks, > > Xiao Li > > 2016-06-18 8:57 GMT-07:00 Yash Sharma : > >> Hi All, >> I have been using the parquet append mode for write which works just >> fine. Just wanted to check if the same is supported for plain text format. >> The below code blows up with error saying the file already exists. >> >> >> >> {code} >> userEventsDF.write.mode("append").partitionBy("year", "month", >> "date").text(outputDir) >> or, >> userEventsDF.write.mode("append").partitionBy("year", "month", >> "date").format("text").save(outputDir) >> {code} >> > >
Re: Does dataframe write append mode work with text format
Hi, Yash, It should work. val df = spark.range(1, 5) .select('id + 1 as 'p1, 'id + 2 as 'p2, 'id + 3 as 'p3, 'id + 4 as 'p4, 'id + 5 as 'p5, 'id as 'b) .selectExpr("p1", "p2", "p3", "p4", "p5", "CAST(b AS STRING) AS s").coalesce(1) df.write.partitionBy("p1", "p2", "p3", "p4", "p5").text(dir.getCanonicalPath) val newDF = spark.read.text(dir.getCanonicalPath) newDF.show() df.write.partitionBy("p1", "p2", "p3", "p4", "p5") .mode(SaveMode.Append).text(dir.getCanonicalPath) val newDF2 = spark.read.text(dir.getCanonicalPath) newDF2.show() I tried it. It works well. Thanks, Xiao Li 2016-06-18 8:57 GMT-07:00 Yash Sharma : > Hi All, > I have been using the parquet append mode for write which works just > fine. Just wanted to check if the same is supported for plain text format. > The below code blows up with error saying the file already exists. > > > > {code} > userEventsDF.write.mode("append").partitionBy("year", "month", > "date").text(outputDir) > or, > userEventsDF.write.mode("append").partitionBy("year", "month", > "date").format("text").save(outputDir) > {code} >
Does dataframe write append mode work with text format
Hi All, I have been using the parquet append mode for write which works just fine. Just wanted to check if the same is supported for plain text format. The below code blows up with error saying the file already exists. {code} userEventsDF.write.mode("append").partitionBy("year", "month", "date").text(outputDir) or, userEventsDF.write.mode("append").partitionBy("year", "month", "date").format("text").save(outputDir) {code}
Re: [VOTE] Release Apache Spark 1.6.2 (RC1)
+1 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Jun 18, 2016 at 9:13 AM, Reynold Xin wrote: > Looks like that's resolved now. > > I will wait till Sunday to cut rc2 to give people more time to find issues > with rc1. > > > On Fri, Jun 17, 2016 at 10:58 AM, Marcelo Vanzin > wrote: >> >> -1 (non-binding) >> >> SPARK-16017 shows a severe perf regression in YARN compared to 1.6.1. >> >> On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin wrote: >> > Please vote on releasing the following candidate as Apache Spark version >> > 1.6.2! >> > >> > The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes if >> > a >> > majority of at least 3+1 PMC votes are cast. >> > >> > [ ] +1 Release this package as Apache Spark 1.6.2 >> > [ ] -1 Do not release this package because ... >> > >> > >> > The tag to be voted on is v1.6.2-rc1 >> > (4168d9c94a9564f6b3e62f5d669acde13a7c7cf6) >> > >> > The release files, including signatures, digests, etc. can be found at: >> > https://home.apache.org/~pwendell/spark-releases/spark-1.6.2-rc1-bin/ >> > >> > Release artifacts are signed with the following key: >> > https://people.apache.org/keys/committer/pwendell.asc >> > >> > The staging repository for this release can be found at: >> > https://repository.apache.org/content/repositories/orgapachespark-1184 >> > >> > The documentation corresponding to this release can be found at: >> > https://home.apache.org/~pwendell/spark-releases/spark-1.6.2-rc1-docs/ >> > >> > >> > === >> > == How can I help test this release? == >> > === >> > If you are a Spark user, you can help us test this release by taking an >> > existing Spark workload and running on this release candidate, then >> > reporting any regressions from 1.6.1. >> > >> > >> > == What justifies a -1 vote for this release? == >> > >> > This is a maintenance release in the 1.6.x series. Bugs already present >> > in >> > 1.6.1, missing features, or bugs related to new features will not >> > necessarily block this release. >> > >> >> >> >> -- >> Marcelo >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Spark 2.0 Dataset Documentation
Going to go ahead and starting working on the docs assuming this gets merged https://github.com/apache/spark/pull/13592. Opened a JIRA https://issues.apache.org/jira/browse/SPARK-16046 Having some issues building docs. The Java docs fail to build. Output when it fails is here: https://gist.github.com/EntilZha/9c585662ef7cda820c311d1c7eb16e42 This might be causing an issue where loading the API docs fails due to some javascript errors (doesn't seem to switch page correctly). The main one repeated several times is: main.js:2 Uncaught SyntaxError: Unexpected token < Pedro On Sat, Jun 18, 2016 at 6:03 AM, Jacek Laskowski wrote: > On Sat, Jun 18, 2016 at 6:13 AM, Pedro Rodriguez > wrote: > > > using Datasets (eg using $ to select columns). > > Or even my favourite one - the tick ` :-) > > Jacek > -- Pedro Rodriguez PhD Student in Distributed Machine Learning | CU Boulder UC Berkeley AMPLab Alumni ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 Github: github.com/EntilZha | LinkedIn: https://www.linkedin.com/in/pedrorodriguezscience
Re: Spark 2.0 Dataset Documentation
On Sat, Jun 18, 2016 at 6:13 AM, Pedro Rodriguez wrote: > using Datasets (eg using $ to select columns). Or even my favourite one - the tick ` :-) Jacek - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.6.2 (RC1)
Looks like that's resolved now. I will wait till Sunday to cut rc2 to give people more time to find issues with rc1. On Fri, Jun 17, 2016 at 10:58 AM, Marcelo Vanzin wrote: > -1 (non-binding) > > SPARK-16017 shows a severe perf regression in YARN compared to 1.6.1. > > On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin wrote: > > Please vote on releasing the following candidate as Apache Spark version > > 1.6.2! > > > > The vote is open until Sunday, June 19, 2016 at 22:00 PDT and passes if a > > majority of at least 3+1 PMC votes are cast. > > > > [ ] +1 Release this package as Apache Spark 1.6.2 > > [ ] -1 Do not release this package because ... > > > > > > The tag to be voted on is v1.6.2-rc1 > > (4168d9c94a9564f6b3e62f5d669acde13a7c7cf6) > > > > The release files, including signatures, digests, etc. can be found at: > > https://home.apache.org/~pwendell/spark-releases/spark-1.6.2-rc1-bin/ > > > > Release artifacts are signed with the following key: > > https://people.apache.org/keys/committer/pwendell.asc > > > > The staging repository for this release can be found at: > > https://repository.apache.org/content/repositories/orgapachespark-1184 > > > > The documentation corresponding to this release can be found at: > > https://home.apache.org/~pwendell/spark-releases/spark-1.6.2-rc1-docs/ > > > > > > === > > == How can I help test this release? == > > === > > If you are a Spark user, you can help us test this release by taking an > > existing Spark workload and running on this release candidate, then > > reporting any regressions from 1.6.1. > > > > > > == What justifies a -1 vote for this release? == > > > > This is a maintenance release in the 1.6.x series. Bugs already present > in > > 1.6.1, missing features, or bugs related to new features will not > > necessarily block this release. > > > > > > -- > Marcelo > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >