SparkSQL read Hive transactional table

2018-10-15 Thread daily
Hi, I use HCatalog Streaming Mutation API to write data to hive transactional table, and then, I use SparkSQL to read data from the hive transactional table. I get the right result. However, SparkSQL uses more time to read hive orc bucket transactional table, beacause SparkSQL

unsubscribe

2018-10-15 Thread Vamshi Talla
Best Regards, Vamshi T

Re: Spark seems to think that a particular broadcast variable is large in size

2018-10-15 Thread Dillon Dukek
In your program persist the smaller table and use count to force it to materialize. Then in the Spark UI go to the Storage tab. The size of your table as spark sees it should be displayed there. Out of curiosity what version / language of Spark are you using? On Mon, Oct 15, 2018 at 11:53 AM

RE: kerberos auth for MS SQL server jdbc driver

2018-10-15 Thread Luca Canali
We have a case where we interact with a Kerberized service and found a simple workaround to distribute and make use of the driver’s Kerberos credential cache file in the executors. Maybe some of the ideas there can be of help for this case too? Our case in on Linux though. Details:

Spark seems to think that a particular broadcast variable is large in size

2018-10-15 Thread Venkat Dabri
I am trying to do a broadcast join on two tables. The size of the smaller table will vary based upon the parameters but the size of the larger table is close to 2TB. What I have noticed is that if I don't set the spark.sql.autoBroadcastJoinThreshold to 10G some of these operations do a

Re: kerberos auth for MS SQL server jdbc driver

2018-10-15 Thread Marcelo Vanzin
Spark only does Kerberos authentication on the driver. For executors it currently only supports Hadoop's delegation tokens for Kerberos. To use something that does not support delegation tokens you have to manually manage the Kerberos login in your code that runs in executors, which might be

Support nested keys in DataFrameWriter.bucketBy

2018-10-15 Thread Dávid Szakállas
Currently (In Spark 2.3.1) we cannot bucket DataFrames by nested columns, e.g df.write.bucketBy(10, "key.a").saveAsTable(“junk”) will result in the following exception: org.apache.spark.sql.AnalysisException: bucket column key.a is not defined in table junk, defined table columns are: key,

Re: Timestamp Difference/operations

2018-10-15 Thread Brandon Geise
How about select unix_timestamp(timestamp2) – unix_timestamp(timestamp1)? From: Paras Agarwal Date: Monday, October 15, 2018 at 2:41 AM To: John Zhuge Cc: user , dev Subject: Re: Timestamp Difference/operations Thanks John, Actually need full date and time difference not just

Re: Spark Structured Streaming resource contention / memory issue

2018-10-15 Thread Patrick McGloin
Hi Jungtaek, Thanks, we thought that might be the issue but haven't tested yet as building against an unreleased version of Spark is tough for us, due to network restrictions. We will try though. I will report back if we find anything. Best regards, Patrick On Fri, Oct 12, 2018, 2:57 PM

kerberos auth for MS SQL server jdbc driver

2018-10-15 Thread Foster Langbein
Has anyone gotten spark to write to SQL server using Kerberos authentication with Microsoft's JDBC driver? I'm having limited success, though in theory it should work. I'm using a YARN-mode 4-node Spark 2.3.0 cluster and trying to write a simple table to SQL Server 2016. I can get it to work if I

Re: Timestamp Difference/operations

2018-10-15 Thread Paras Agarwal
Thanks John, Actually need full date and time difference not just date difference, which I guess not supported. Let me know if its possible, or any UDF available for the same. Thanks And Regards, Paras From: John Zhuge Sent: Friday, October 12, 2018