Re: jdbcRDD for data ingestion from RDBMS

2016-10-18 Thread Mich Talebzadeh
Hi, If we are talking about billions of records and depending on your network and RDBMs with parallel connections, from my experience it works OK for Dimension tables of moderate size, in that you can have parallel connections to RDBMS (assuming the RDBMS has a primary key/unique column) to

Re: jdbcRDD for data ingestion from RDBMS

2016-10-18 Thread Teng Qiu
Hi Ninad, i believe the purpose of jdbcRDD is to use RDBMS as an addtional data source during the data processing, main goal of spark is still analyzing data from HDFS-like file system. to use spark as a data integration tool to transfer billions of records from RDBMS to HDFS etc. could work, but

Re: JdbcRDD Constructor

2015-10-20 Thread satish chandra j
Hi Deenar, Thanks for your valuable inputs Here is a situation, if a Source Table does not have any such column(unique values,numeric and sequential) which is suitable as Partition Column to be specified for JDBCRDD Constructor or DataSource API.How to proceed further on this scenario and also

Re: JdbcRDD Constructor

2015-10-20 Thread Deenar Toraskar
You have 2 options a) don't use partitioning, if the table is small spark will only use one task to load it val jdbcDF = sqlContext.read.format("jdbc").options( Map("url" -> "jdbc:postgresql:dbserver", "dbtable" -> "schema.tablename")).load() b) create a view that includes hashcode column

Re: JdbcRDD Constructor

2015-09-24 Thread satish chandra j
HI Deenar, Please find the SQL query below: var SQL_RDD= new JdbcRDD( sc, ()=> DriverManager.getConnection(url,user,pass),"select col1, col2, col3..col 37 from schema.Table LIMIT ? OFFSET ?",100,0,*1*,(r: ResultSet) => (r.getInt("col1"),r.getInt("col2")...r.getInt("col37"))) When I

Re: JdbcRDD Constructor

2015-09-24 Thread Deenar Toraskar
On 24 September 2015 at 17:48, Deenar Toraskar < deenar.toras...@thinkreactive.co.uk> wrote: > you are interpreting the JDBCRDD API incorrectly. If you want to use > partitions, then the column used to partition and present in the where > clause must be numeric and the lower bound and upper bound

Re: JdbcRDD Constructor

2015-09-23 Thread Rishitesh Mishra
Which version of Spark you are using ?? I can get correct results using JdbcRDD. Infact there is a test suite precisely for this (JdbcRDDSuite) . I changed according to your input and got correct results from this test suite. On Wed, Sep 23, 2015 at 11:00 AM, satish chandra j

Re: JdbcRDD Constructor

2015-09-23 Thread Rishitesh Mishra
I am using Spark 1.5. I always get count = 100, irrespective of num partitions. On Wed, Sep 23, 2015 at 5:00 PM, satish chandra j wrote: > HI, > Currently using Spark 1.2.2, could you please let me know correct results > output count which you got it by using

Re: JdbcRDD Constructor

2015-09-23 Thread satish chandra j
HI, Could anybody provide inputs if they have came across similar issue @Rishitesh Could you provide if any sample code to use JdbcRDDSuite Regards, Satish Chandra On Wed, Sep 23, 2015 at 5:14 PM, Rishitesh Mishra wrote: > I am using Spark 1.5. I always get count =

Re: JdbcRDD Constructor

2015-09-23 Thread satish chandra j
HI, Currently using Spark 1.2.2, could you please let me know correct results output count which you got it by using JdbcRDDSuite Regards, Satish Chandra On Wed, Sep 23, 2015 at 4:02 PM, Rishitesh Mishra wrote: > Which version of Spark you are using ?? I can get

Re: JdbcRDD Constructor

2015-09-23 Thread Deenar Toraskar
Satish Can you post the SQL query you are using? The SQL query must have 2 placeholders and both of them should be an inclusive range (<= and >=).. e.g. select title, author from books where ? <= id and id <= ? Are you doing this? Deenar On 23 September 2015 at 20:18, Deenar Toraskar <

Re: JdbcRDD and ClassTag issue

2015-07-20 Thread nitinkalra2000
Thanks Sujee :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-and-ClassTag-issue-tp18570p23912.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-19 Thread Cody Koeninger
At the beginning of the code, do a query to find the current maximum ID Don't just put in an arbitrarily large value, or all of your rows will end up in 1 spark partition at the beginning of the range. The question of keys is up to you... all that you need to be able to do is write a sql

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-19 Thread Dmitry Goldenberg
Yup, I did see that. Good point though, Cody. The mismatch was happening for me when I was trying to get the 'new JdbcRDD' approach going. Once I switched to the 'create' method things are working just fine. Was just able to refactor the 'get connection' logic into a 'DbConnection implements

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-19 Thread Dmitry Goldenberg
That's a good point, thanks. Is there a way to instrument continuous realtime streaming of data out of a database? If the data keeps changing, one way to implement extraction would be to keep track of something like the last-modified timestamp and instrument the query to be 'where lastmodified ?'

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-18 Thread Dmitry Goldenberg
Thanks, Cody. Yes, I originally started off by looking at that but I get a compile error if I try and use that approach: constructor JdbcRDD in class JdbcRDDT cannot be applied to given types. Not to mention that JavaJdbcRDDSuite somehow manages to not pass in the class tag (the last argument).

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-18 Thread Cody Koeninger
Take a look at https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java On Wed, Feb 18, 2015 at 11:14 AM, dgoldenberg dgoldenberg...@gmail.com wrote: I'm reading data from a database using JdbcRDD, in Java, and I have an implementation of

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-18 Thread Cody Koeninger
Is sc there a SparkContext or a JavaSparkContext? The compilation error seems to indicate the former, but JdbcRDD.create expects the latter On Wed, Feb 18, 2015 at 12:30 PM, Dmitry Goldenberg dgoldenberg...@gmail.com wrote: I have tried that as well, I get a compile error -- [ERROR]

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-18 Thread Cody Koeninger
That test I linked https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java#L90 is calling a static method JdbcRDD.create, not new JdbcRDD. Is that what you tried doing? On Wed, Feb 18, 2015 at 12:00 PM, Dmitry Goldenberg dgoldenberg...@gmail.com

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-18 Thread Dmitry Goldenberg
I have tried that as well, I get a compile error -- [ERROR] ...SparkProto.java:[105,39] error: no suitable method found for create(SparkContext,anonymous ConnectionFactory,String,int,int,int,anonymous FunctionResultSet,Integer) The code is a copy and paste: JavaRDDInteger jdbcRDD =

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-18 Thread Dmitry Goldenberg
Cody, you were right, I had a copy and paste snag where I ended up with a vanilla SparkContext rather than a Java one. I also had to *not* use my function subclasses, rather just use anonymous inner classes for the Function stuff and that got things working. I'm fully following the JdbcRDD.create

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-18 Thread Cody Koeninger
Cant you implement the org.apache.spark.api.java.function.Function interface and pass an instance of that to JdbcRDD.create ? On Wed, Feb 18, 2015 at 3:48 PM, Dmitry Goldenberg dgoldenberg...@gmail.com wrote: Cody, you were right, I had a copy and paste snag where I ended up with a vanilla

Re: JdbcRDD, ClassCastException with scala.Function0

2015-02-18 Thread Dmitry Goldenberg
That's exactly what I was doing. However, I ran into runtime issues with doing that. For instance, I had a public class DbConnection extends AbstractFunction0Connection implements Serializable I got a runtime error from Spark complaining that DbConnection wasn't an instance of scala.Function0.

Re: JdbcRdd for Python

2015-01-05 Thread Michael Armbrust
I'll add that there is a JDBC connector for the Spark SQL data sources API in the works, and this will work with python (though the standard SchemaRDD type conversions). On Mon, Jan 5, 2015 at 7:09 AM, Cody Koeninger c...@koeninger.org wrote: JavaDataBaseConnectivity is, as far as I know, JVM

Re: JdbcRdd for Python

2015-01-05 Thread Cody Koeninger
JavaDataBaseConnectivity is, as far as I know, JVM specific. The JdbcRDD is expecting to deal with Jdbc Connection and ResultSet objects. I haven't done any python development in over a decade, but if someone wants to work together on a python equivalent I'd be happy to help out. The original

Re: JdbcRdd for Python

2015-01-02 Thread elliott cordo
yeah.. i went through the source, and unless i'm missing something it's not.. agreed, i'd love to see it implemented! On Fri, Jan 2, 2015 at 3:59 PM, Tim Schweichler tim.schweich...@healthination.com wrote: Doesn't look like it is at the moment. If that's the case I'd love to see it

Re: JdbcRdd for Python

2015-01-02 Thread Tim Schweichler
Doesn't look like it is at the moment. If that's the case I'd love to see it implemented. From: elliott cordo elliottco...@gmail.commailto:elliottco...@gmail.com Date: Friday, January 2, 2015 at 8:17 AM To: user@spark.apache.orgmailto:user@spark.apache.org

Re: JdbcRDD and ClassTag issue

2015-01-01 Thread Sujee
Hi, I encountered the same issue and solved it. Please check my blog post http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/ Thank you -- View this message in context:

Re: JdbcRDD

2015-01-01 Thread Sujee
Hi, I wrote a blog post about this. http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-tp19233p20939.html Sent from the Apache Spark User List mailing list

Re: JdbcRDD

2014-11-18 Thread mykidong
I had also same problem to use JdbcRDD in java. For me, I have written a class in scala to get JdbcRDD, and I call this instance from java. for instance, JdbcRDDWrapper.scala like this: ... import java.sql._ import org.apache.spark.SparkContext import org.apache.spark.rdd.JdbcRDD import

Re: JdbcRDD

2014-11-18 Thread Krishna
%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: Re: JdbcRDD http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-tp19233p19235.html Sent from the Apache Spark User List mailing list archive http://apache-spark-user-list

Re: JdbcRDD in Java

2014-10-28 Thread Sean Owen
That declaration looks OK for Java 8, at least when I tried it just now vs master. The only thing I see wrong here is getInt throws an exception which means the lambda has to be more complicated than this. This is Java code here calling the constructor so yes it can work fine from Java (8). On

Re: jdbcRDD from JAVA

2014-08-31 Thread Sean Owen
https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/JdbcRDD.html#JdbcRDD(org.apache.spark.SparkContext, scala.Function0, java.lang.String, long, long, int, scala.Function1, scala.reflect.ClassTag) I don't think there is a completely Java-friendly version of this class. However you