Re: [External Sender] Writing dataframe to vertica

2018-10-16 Thread Femi Anthony
How are you trying to write to Vertica ? Can you provide some snippets of code ? Femi On Tue, Oct 16, 2018 at 7:24 PM Nikhil Goyal wrote: > Hi guys, > > I am trying to write dataframe to vertica using spark. It seems like spark > is creating a temp table under public schema. I don't have

What eactly is Function shipping?

2018-10-16 Thread kant kodali
Hi All, Everyone talks about how easy function shipping is in Scala. I Immediately go "wait a minute" isn't it just Object serialization and deserialization that existing in Java since a long time ago or am I missing something profound here? Thanks!

Re: kerberos auth for MS SQL server jdbc driver

2018-10-16 Thread Foster Langbein
Thanks Luca, seems like a neat workaround. I tried a bit to get this to work - I'm using spark-submit but same idea should work I thought. Do you know if this technique must use a TGT file that matches the user the spark job executes as? In my case I want to use a separate service account known by

Re: kerberos auth for MS SQL server jdbc driver

2018-10-16 Thread Foster Langbein
Thanks Marcelo, that makes a lot of sense to me now. Do you know if there are any plans to expand Kerberos auth to the executors? The current executor behaviour is quite curious - you can see in the trace information that it consumes the jaas conf file and keytab (indeed they're required - it will

?????? SparkSQL read Hive transactional table

2018-10-16 Thread daily
Hi, Spark version: 2.3.0 Hive version: 2.1.0 Best regards. -- -- ??: "Gourav Sengupta"; : 2018??10??16??(??) 6:35 ??: "daily"; : "user"; "dev"; : Re: SparkSQL read Hive transactional table Hi, can I please

Writing dataframe to vertica

2018-10-16 Thread Nikhil Goyal
Hi guys, I am trying to write dataframe to vertica using spark. It seems like spark is creating a temp table under public schema. I don't have access to public schema hence the job is failing. Is there a way to specify another schema? Error ERROR s2v.S2VUtils: createJobStatusTable: FAILED to

Application crashes when encountering oracle timestamp

2018-10-16 Thread rishmanisation
I am writing a Spark application to profile an Oracle database. The application works perfectly without any timestamp columns, but when I do try to profile a database with a timestamp column I run into the following error: Exception in thread "main" java.sql.SQLException: Unrecognized SQL type

[Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-16 Thread Patrick Brown
I recently upgraded to spark 2.3.1 I have had these same settings in my spark submit script, which worked on 2.0.2, and according to the documentation appear to not have changed: spark.ui.retainedTasks=1 spark.ui.retainedStages=1 spark.ui.retainedJobs=1 However in 2.3.1 the UI doesn't seem to

Re: Spark seems to think that a particular broadcast variable is large in size

2018-10-16 Thread Dillon Dukek
You keep mentioning that you're viewing this after the fact in the spark history server. Also the spark-shell isn't a UI so I'm not sure what you mean by saying that the storage tab is blank in the spark-shell. Just so I'm clear about what you're doing, are you looking at this info while your

Cached data not showing up in Storage tab

2018-10-16 Thread Venkat Dabri
When I cache a variable the data never shows up in the storage tab. The storage tab is always blank. I have tried it in Zeppelin as well as spark-shell. scala> val classCount = spark.read.parquet("s3:// /classCount") scala> classCount.persist scala> classCount.count Nothing shows up in the

Re: Spark seems to think that a particular broadcast variable is large in size

2018-10-16 Thread Venkat Dabri
The same problem is mentioned here : https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html https://stackoverflow.com/questions/44792213/blank-storage-tab-in-spark-history-server On Tue, Oct 16, 2018 at 8:06 AM Venkat Dabri wrote: > > I did try that

Re: Spark seems to think that a particular broadcast variable is large in size

2018-10-16 Thread Venkat Dabri
I did try that mechanism before but the data never shows up in the storage tab. The storage tab is always blank. I have tried it in Zeppelin as well as spark-shell. scala> val classCount = spark.read.parquet("s3:// /classCount") scala> classCount.persist scala> classCount.count Nothing shows

Re: [External Sender] Pyspark Window orderBy

2018-10-16 Thread mhussain
Yes, I did try it and you are right it behaves the same so far. I am not sure how its gonna behave for large data sets though. I don't see anything in the documentation confirming this behavior. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: [External Sender] Pyspark Window orderBy

2018-10-16 Thread Femi Anthony
I think that’s how it should behave. Did you try it out and see ? On Tue, Oct 16, 2018 at 5:11 AM mhussain wrote: > Hi, > > I have a dataframe which looks like > > ++---+--++ > |group_id| id| text|type| > ++---+--++ > | 1| 1| one| a| > | 1| 1|

Re: SparkSQL read Hive transactional table

2018-10-16 Thread Gourav Sengupta
Hi, can I please ask which version of Hive and Spark are you using? Regards, Gourav Sengupta On Tue, Oct 16, 2018 at 2:42 AM daily wrote: > Hi, > > I use HCatalog Streaming Mutation API to write data to hive transactional > table, and then, I use SparkSQL to read data from the hive

Pyspark Window orderBy

2018-10-16 Thread mhussain
Hi, I have a dataframe which looks like ++---+--++ |group_id| id| text|type| ++---+--++ | 1| 1| one| a| | 1| 1| two| t| | 1| 2| three| a| | 1| 2| four| t| | 1| 5| five| a| | 1| 6| six| t| | 1|

Re: Timestamp Difference/operations

2018-10-16 Thread Paras Agarwal
Thanks Srabasti, I am trying to convert teradata to spark sql. TERADATA: select * from Table1 where Date '1974-01-02' > CAST(birth_date AS TIMESTAMP(0)) + (TIME '12:34:34' - TIME '00:00:00' HOUR TO SECOND); HIVE ( With some tweaks i can write): SELECT * FROM foodmart.trimmed_employee WHERE

SocketTimeoutException with spark-r and using latest R version

2018-10-16 Thread Thijs Haarhuis
Hi all, I am running into a problem that once in a while my job is giving me the following exception(s): java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at