date:20160423

Re: How this unit test passed on master trunk?

2016-04-23 Thread Zhan Zhang

There are multiple records for the DF scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).show +---+-+ | a|min(struct(unresolvedstar()))| +---+-+ | 1|[1,1]| | 3|[3,1]| | 2|

Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException

2016-04-23 Thread Ted Yu

Can you check that the DFSClient Spark uses is the same version as on the server side ? The client and server (NameNode) negotiate a "crypto protocol version" - this is a forward-looking feature. Please note: bq. Client provided: [] Meaning client didn't provide any supported crypto protocol

Re: Spark SQL Transaction

2016-04-23 Thread Andrés Ivaldi

Thanks, I'll take a look to JdbcUtils regards. On Sat, Apr 23, 2016 at 2:57 PM, Todd Nist wrote: > I believe the class you are looking for is > org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala. > > By default in savePartition(...) , it will do the following:

Re: filtering the correct value in Spark streaming data from Kafka

2016-04-23 Thread Mich Talebzadeh

is format of data streaming in only three columns ID, Date and Signal > separated by comma > > 97,20160423-182633,93.19871243745806169848 > > So I want to pick up lines including Signal > "90.0" and discard the rest > > This is what I am getting from countByValu

Spark 2.0 Aggregator problems

2016-04-23 Thread Don Drake

I've downloaded a nightly build of Spark 2.0 (from today 4/23) and was attempting to create an aggregator that will create a Seq[Rows], or specifically a Seq[Class1], my custom class. When I attempt to run the following code in a spark-shell, it errors out: Gist:

filtering the correct value in Spark streaming data from Kafka

2016-04-23 Thread Mich Talebzadeh

lines including Signal > "90.0" and discard the rest This is what I am getting from countByValueAndWindow.print() Time: 146143749 ms --- ((98,3),1) ((40.80441152620633003508,3),1) ((60.71243694664215996759,3),1) ((95,3),1) ((57.23635208501673894915,3),1) ((20160423-193322,27),1) (

Re: Spark SQL Transaction

2016-04-23 Thread Todd Nist

I believe the class you are looking for is org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala. By default in savePartition(...) , it will do the following: if (supportsTransactions) { conn.setAutoCommit(false) // Everything in the same db transaction. } Then at line 224, it will

Re: Dataset aggregateByKey equivalent

2016-04-23 Thread Michael Armbrust

Have you looked at aggregators? https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html On Fri, Apr 22, 2016 at 6:45 PM, Lee Becker wrote: > Is there a way to do aggregateByKey on Datasets the way one can on an RDD? > > Consider the

Re: Spark SQL Transaction

2016-04-23 Thread Mich Talebzadeh

In your JDBC connection you can do conn.commit(); or conn.rollback() Why don't insert your data into #table in MSSQL and from there do one insert/select into the main table. That is from ETL. In that case your main table will be protected. Either it will have full data or no data. Also have

Re: Spark SQL Transaction

2016-04-23 Thread Andrés Ivaldi

Hello, so I executed Profiler and found that implicit isolation was turn on by JDBC driver, this is the default behavior of MSSQL JDBC driver, but it's possible change it with setAutoCommit method. There is no property for that so I've to do it in the code, do you now where can I access to the

Re: How this unit test passed on master trunk?

Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException

Re: Spark SQL Transaction

Re: filtering the correct value in Spark streaming data from Kafka

Spark 2.0 Aggregator problems

filtering the correct value in Spark streaming data from Kafka

Re: Spark SQL Transaction

Re: Dataset aggregateByKey equivalent

Re: Spark SQL Transaction

Re: Spark SQL Transaction

10 matches

Site Navigation

Mail list logo

Footer information