Re: How this unit test passed on master trunk?

2016-04-23 Thread Zhan Zhang
There are multiple records for the DF scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).show +---+-+ | a|min(struct(unresolvedstar()))| +---+-+ | 1|[1,1]| | 3|[3,1]| | 2|

Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException

2016-04-23 Thread Ted Yu
Can you check that the DFSClient Spark uses is the same version as on the server side ? The client and server (NameNode) negotiate a "crypto protocol version" - this is a forward-looking feature. Please note: bq. Client provided: [] Meaning client didn't provide any supported crypto protocol

Re: Spark SQL Transaction

2016-04-23 Thread Andrés Ivaldi
Thanks, I'll take a look to JdbcUtils regards. On Sat, Apr 23, 2016 at 2:57 PM, Todd Nist wrote: > I believe the class you are looking for is > org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala. > > By default in savePartition(...) , it will do the following:

Re: filtering the correct value in Spark streaming data from Kafka

2016-04-23 Thread Mich Talebzadeh
is format of data streaming in only three columns ID, Date and Signal > separated by comma > > 97,20160423-182633,93.19871243745806169848 > > So I want to pick up lines including Signal > "90.0" and discard the rest > > This is what I am getting from countByValu

Spark 2.0 Aggregator problems

2016-04-23 Thread Don Drake
I've downloaded a nightly build of Spark 2.0 (from today 4/23) and was attempting to create an aggregator that will create a Seq[Rows], or specifically a Seq[Class1], my custom class. When I attempt to run the following code in a spark-shell, it errors out: Gist:

filtering the correct value in Spark streaming data from Kafka

2016-04-23 Thread Mich Talebzadeh
lines including Signal > "90.0" and discard the rest This is what I am getting from countByValueAndWindow.print() Time: 146143749 ms --- ((98,3),1) ((40.80441152620633003508,3),1) ((60.71243694664215996759,3),1) ((95,3),1) ((57.23635208501673894915,3),1) ((20160423-193322,27),1) (

Re: Spark SQL Transaction

2016-04-23 Thread Todd Nist
I believe the class you are looking for is org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala. By default in savePartition(...) , it will do the following: if (supportsTransactions) { conn.setAutoCommit(false) // Everything in the same db transaction. } Then at line 224, it will

Re: Dataset aggregateByKey equivalent

2016-04-23 Thread Michael Armbrust
Have you looked at aggregators? https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html On Fri, Apr 22, 2016 at 6:45 PM, Lee Becker wrote: > Is there a way to do aggregateByKey on Datasets the way one can on an RDD? > > Consider the

Re: Spark SQL Transaction

2016-04-23 Thread Mich Talebzadeh
In your JDBC connection you can do conn.commit(); or conn.rollback() Why don't insert your data into #table in MSSQL and from there do one insert/select into the main table. That is from ETL. In that case your main table will be protected. Either it will have full data or no data. Also have

Re: Spark SQL Transaction

2016-04-23 Thread Andrés Ivaldi
Hello, so I executed Profiler and found that implicit isolation was turn on by JDBC driver, this is the default behavior of MSSQL JDBC driver, but it's possible change it with setAutoCommit method. There is no property for that so I've to do it in the code, do you now where can I access to the