Re: Quick one... AWS SDK version?

2017-10-07 Thread Jean Georges Perrin
Hey Marco, I am actually reading from S3 and I use 2.7.3, but I inherited the project and they use some AWS API from Amazon SDK, which version is like from yesterday :) so it’s confused and AMZ is changing its version like crazy so it’s a little difficult to follow. Right now I went back to

Re: Quick one... AWS SDK version?

2017-10-07 Thread Marco Mistroni
Hi JG out of curiosity what's ur usecase? are you writing to S3? you could use Spark to do that , e.g using hadoop package org.apache.hadoop:hadoop-aws:2.7.1 ..that will download the aws client which is in line with hadoop 2.7.1? hth marco On Fri, Oct 6, 2017 at 10:58 PM, Jonathan Kelly

Cases when to clear the checkpoint directories.

2017-10-07 Thread John, Vishal (Agoda)
Hello TD, You had replied to one of the questions about checkpointing – This is an unfortunate design on my part when I was building DStreams :) Fortunately, we learnt from our mistakes and built Structured Streaming the correct way. Checkpointing in Structured Streaming stores only the

Re: DataFrame multiple agg on the same column

2017-10-07 Thread yohann jardin
Hey Somasundaram, Using a map is only one way to use the function agg. For the complete list: https://spark.apache.org/docs/1.5.2/api/java/org/apache/spark/sql/GroupedData.html Using the first one:

DataFrame multiple agg on the same column

2017-10-07 Thread Somasundaram Sekar
Hi, I have a GroupedData object, on which I perform aggregation of few columns since GroupedData takes in map, I cannot perform multiple aggregate on the same column, say I want to have both max and min of amount. So the below line of code will return only one aggregate per column

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

2017-10-07 Thread Jules Damji
You might find these blogs helpful to parse & extract data from complex structures: https://databricks.com/blog/2017/06/27/4-sql-high-order-lambda-functions-examine-complex-structured-data-databricks.html

[spark-core] SortShuffleManager - when to enable Serialized sorting

2017-10-07 Thread Weitong Chen
hi, Why check dependency.aggregator but not dependency.mapSideCombine in canUseSerializedShuffle? In BaseShuffle' SortShuffleWriter, dep.mapSideCombine decides dep.aggregator is passed to sorter or not. *canUseSerializedShuffle* /** * Helper method for determining whether a

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

2017-10-07 Thread Matteo Cossu
Hello, I think you should use *from_json *from spark.sql.functions to parse the json string and convert it to a StructType. Afterwards, you

How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

2017-10-07 Thread kant kodali
I have a Dataset ds which consists of json rows. *Sample Json Row (This is just an example of one row in the dataset)* [ {"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject": "english", "year": 2016}]} {"name": "bar", "address": {"state": "OH", "country":

Re: TallSkinnyQR

2017-10-07 Thread Xianyang Liu
2017年10月7日 上午5:29,Iman Mohtashemi 写道:Hi guys,Here is another problem I encountered using the tallskinny QR. I've attached some clear documentation of the problem. I posted it on the forum but I'm not sure if it went throughBest regards,ImanOn Fri, Dec 30, 2016 at 9:22 AM