from:"Naveen Madhire"

spark session jdbc performance

2017-10-24 Thread Naveen Madhire

Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, *Using Spark 2.0.2* *val *df = spark_session.read.format(*"jdbc"*) .option(*"driver"*,*"*oracle.jdbc.OracleDriver*"*) .option(*"url"*, jdbc_url)

Re: Spark streaming persist to hdfs question

2017-06-25 Thread Naveen Madhire

st to use Flume, if possible, as it has in built HDFS log > rolling capabilities > > On Mon, Jun 26, 2017 at 1:09 PM, Naveen Madhire <vmadh...@umail.iu.edu> > wrote: > >> Hi, >> >> I am using spark streaming with 1 minute duration to read data from kafka >&

Spark streaming persist to hdfs question

2017-06-25 Thread Naveen Madhire

Hi, I am using spark streaming with 1 minute duration to read data from kafka topic, apply transformations and persist into HDFS. The application is creating a new directory every 1 minute with many partition files(= nbr of partitions). What parameter should I need to change/configure to persist

Repartition question

2015-08-03 Thread Naveen Madhire

Hi All, I am running the WikiPedia parsing example present in the Advance Analytics with Spark book. https://github.com/sryza/aas/blob/d3f62ef3ed43a59140f4ae8afbe2ef81fc643ef2/ch06-lsa/src/main/scala/com/cloudera/datascience/lsa/ParseWikipedia.scala#l112 The partitions of the RDD returned by

pyspark issue

2015-07-27 Thread Naveen Madhire

Hi, I am running pyspark in windows and I am seeing an error while adding pyfiles to the sparkcontext. below is the example, sc = SparkContext(local,Sample,pyFiles=C:/sample/yattag.zip) this fails with no file found error for C The below logic is treating the path as individual files like C,

Re: PySpark Nested Json Parsing

2015-07-20 Thread Naveen Madhire

I had the similar issue with spark 1.3 After migrating to Spark 1.4 and using sqlcontext.read.json it worked well I think you can look at dataframe select and explode options to read the nested json elements, array etc. Thanks. On Mon, Jul 20, 2015 at 11:07 AM, Davies Liu dav...@databricks.com

Re: How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-07-18 Thread Naveen Madhire

I am facing the same issue, i tried this but getting compilation error for the $ in the explode function So, I had to modify to the below to make it work. df.select(explode(new Column(entities.user_mentions)).as(mention)) On Wed, Jun 24, 2015 at 2:48 PM, Michael Armbrust

Re: Spark and HDFS

2015-07-15 Thread Naveen Madhire

Yes. I did this recently. You need to copy the cloudera cluster related conf files into the local machine and set HADOOP_CONF_DIR or YARN_CONF_DIR. And also local machine should be able to ssh to the cloudera cluster. On Wed, Jul 15, 2015 at 8:51 AM, ayan guha guha.a...@gmail.com wrote:

Re: Unit tests of spark application

2015-07-13 Thread Naveen Madhire

use spark-testing-base from spark-packages.org as a basis for your unittests. On Fri, Jul 10, 2015 at 12:03 PM, Daniel Siegmann daniel.siegm...@teamaol.com wrote: On Fri, Jul 10, 2015 at 1:41 PM, Naveen Madhire vmadh...@umail.iu.edu wrote: I want to write junit test cases in scala

Unit tests of spark application

2015-07-10 Thread Naveen Madhire

Hi, I want to write junit test cases in scala for testing spark application. Is there any guide or link which I can refer. Thank you very much. -Naveen

DataFrame question

2015-07-07 Thread Naveen Madhire

Hi All, I am working with dataframes and have been struggling with this thing, any pointers would be helpful. I've a Json file with the schema like this, links: array (nullable = true) ||-- element: struct (containsNull = true) |||-- desc: string (nullable = true) |||--

Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Naveen Madhire

Hi Marcelo, Quick Question. I am using Spark 1.3 and using Yarn Client mode. It is working well, provided I have to manually pip-install all the 3rd party libraries like numpy etc to the executor nodes. So the SPARK-5479 fix in 1.5 which you mentioned fix this as well? Thanks. On Thu, Jun

Re: How to set HBaseConfiguration in Spark

2015-05-20 Thread Naveen Madhire

Cloudera blog has some details. Please check if this is helpful to you. http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ Thanks. On Wed, May 20, 2015 at 4:21 AM, donhoff_h 165612...@qq.com wrote: Hi, all I wrote a program to get HBaseConfiguration object in Spark.

Fwd: Sample Spark Program Error

2014-12-31 Thread Naveen Madhire

Hi All, I am trying to run a sample Spark program using Scala SBT, Below is the program, def main(args: Array[String]) { val logFile = E:/ApacheSpark/usb/usb/spark/bin/README.md // Should be some file on your system val sc = new SparkContext(local, Simple App,

Re: Fwd: Sample Spark Program Error

2014-12-31 Thread Naveen Madhire

. Lines with a: 24, Lines with b: 15 The exception seems to be happening with Spark cleanup after executing your code. Try adding sc.stop() at the end of your program to see if the exception goes away. On Wednesday, December 31, 2014 6:40 AM, Naveen Madhire vmadh...@umail.iu.edu wrote

spark session jdbc performance

Re: Spark streaming persist to hdfs question

Spark streaming persist to hdfs question

Repartition question

pyspark issue

Re: PySpark Nested Json Parsing

Re: How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

Re: Spark and HDFS

Re: Unit tests of spark application

Unit tests of spark application

DataFrame question

Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

Re: How to set HBaseConfiguration in Spark

Fwd: Sample Spark Program Error

Re: Fwd: Sample Spark Program Error

15 matches

Site Navigation

Mail list logo

Footer information