Hi,
I use HCatalog Streaming Mutation API to write data to hive transactional
table, and then, I use SparkSQL to read data from the hive transactional
table. I get the right result.
However, SparkSQL uses more time to read hive orc bucket transactional
table, beacause SparkSQL
Best Regards,
Vamshi T
In your program persist the smaller table and use count to force it to
materialize. Then in the Spark UI go to the Storage tab. The size of your
table as spark sees it should be displayed there. Out of curiosity what
version / language of Spark are you using?
On Mon, Oct 15, 2018 at 11:53 AM
We have a case where we interact with a Kerberized service and found a simple
workaround to distribute and make use of the driver’s Kerberos credential cache
file in the executors. Maybe some of the ideas there can be of help for this
case too? Our case in on Linux though. Details:
I am trying to do a broadcast join on two tables. The size of the
smaller table will vary based upon the parameters but the size of the
larger table is close to 2TB. What I have noticed is that if I don't
set the spark.sql.autoBroadcastJoinThreshold to 10G some of these
operations do a
Spark only does Kerberos authentication on the driver. For executors it
currently only supports Hadoop's delegation tokens for Kerberos.
To use something that does not support delegation tokens you have to
manually manage the Kerberos login in your code that runs in executors,
which might be
Currently (In Spark 2.3.1) we cannot bucket DataFrames by nested columns, e.g
df.write.bucketBy(10, "key.a").saveAsTable(“junk”)
will result in the following exception:
org.apache.spark.sql.AnalysisException: bucket column key.a is not defined in
table junk, defined table columns are: key,
How about
select unix_timestamp(timestamp2) – unix_timestamp(timestamp1)?
From: Paras Agarwal
Date: Monday, October 15, 2018 at 2:41 AM
To: John Zhuge
Cc: user , dev
Subject: Re: Timestamp Difference/operations
Thanks John,
Actually need full date and time difference not just
Hi Jungtaek,
Thanks, we thought that might be the issue but haven't tested yet as
building against an unreleased version of Spark is tough for us, due to
network restrictions. We will try though. I will report back if we find
anything.
Best regards,
Patrick
On Fri, Oct 12, 2018, 2:57 PM
Has anyone gotten spark to write to SQL server using Kerberos
authentication with Microsoft's JDBC driver? I'm having limited success,
though in theory it should work.
I'm using a YARN-mode 4-node Spark 2.3.0 cluster and trying to write a
simple table to SQL Server 2016. I can get it to work if I
Thanks John,
Actually need full date and time difference not just date difference,
which I guess not supported.
Let me know if its possible, or any UDF available for the same.
Thanks And Regards,
Paras
From: John Zhuge
Sent: Friday, October 12, 2018
11 matches
Mail list logo