The spark sql ODBC/JDBC driver that supports Kerbose delegation

2019-02-11 Thread luby
Hi, All, We want to use SPARK SQL in Tableau. But according to the https://onlinehelp.tableau.com/current/pro/desktop/en-us/examples_sparksql.htm The driver provided by Tableau doesn't suppport Kerbose delegation. Is there any SPARK SQL ODBC or JDBC driver that support Kerbose delegation?

Create Hive table from CSVfile

2019-02-11 Thread Soheil Pourbafrani
Hi, Using the following code I create a Thrift Server including a Hive table from CSV file and I expect it considers the first line as a header but when I select data from the so-called table, I see it considers the CSV header as data row! It seems the line "TBLPROPERTIES(skip.header.line.count =

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Vadim Semenov
something like this import org.apache.spark.TaskContext ds.map(r => { val taskContext = TaskContext.get() if (taskContext.partitionId == 1000) { throw new RuntimeException } r }) On Mon, Feb 11, 2019 at 8:41 AM Serega Sheypak wrote: > > I need to crash task which does repartition. >

Re: structured streaming handling validation and json flattening

2019-02-11 Thread Jacek Laskowski
Hi Lian, "What have you tried?" would be a good starting point. Any help on this? How do you read the JSONs? readStream.json? You could use readStream.text followed by filter to include/exclude good/bad JSONs. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Serega Sheypak
I need to crash task which does repartition. пн, 11 февр. 2019 г. в 10:37, Gabor Somogyi : > What blocks you to put if conditions inside the mentioned map function? > > On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak > wrote: > >> Yeah, but I don't need to crash entire app, I want to fail

RE: Multiple column aggregations

2019-02-11 Thread Shiva Prashanth Vallabhaneni
Hi Sonu, You could use a query that is similar to the below one. You could further optimize the below query by adding a WHERE clause. I would suggest that you benchmark the performance of both approaches (multiple group-by queries vs single query with multiple window functions), before

Data growth vs Cluster Size planning

2019-02-11 Thread Aakash Basu
Hi, I ran a dataset of *200 columns and 0.2M records* in a cluster of *1 master 18 GB, 2 slaves 32 GB each, **16 cores/slave*, took around *772 minutes* for a *very large ML tuning based job* (training). Now, my requirement is to run the *same operation on 3M records*. Any idea on how we should

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Gabor Somogyi
What blocks you to put if conditions inside the mentioned map function? On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak wrote: > Yeah, but I don't need to crash entire app, I want to fail several tasks > or executors and then wait for completion. > > вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Serega Sheypak
Yeah, but I don't need to crash entire app, I want to fail several tasks or executors and then wait for completion. вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi : > Another approach is adding artificial exception into the application's > source code like this: > > val query = input.toDS.map(_ /