Hi,
If we are talking about billions of records and depending on your network
and RDBMs with parallel connections, from my experience it works OK for
Dimension tables of moderate size, in that you can have parallel
connections to RDBMS (assuming the RDBMS has a primary key/unique column)
to
Hi Ninad, i believe the purpose of jdbcRDD is to use RDBMS as an addtional
data source during the data processing, main goal of spark is still
analyzing data from HDFS-like file system.
to use spark as a data integration tool to transfer billions of records
from RDBMS to HDFS etc. could work, but
Hi Deenar,
Thanks for your valuable inputs
Here is a situation, if a Source Table does not have any such column(unique
values,numeric and sequential) which is suitable as Partition Column to be
specified for JDBCRDD Constructor or DataSource API.How to proceed further
on this scenario and also
You have 2 options
a) don't use partitioning, if the table is small spark will only use one
task to load it
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:postgresql:dbserver",
"dbtable" -> "schema.tablename")).load()
b) create a view that includes hashcode column
HI Deenar,
Please find the SQL query below:
var SQL_RDD= new JdbcRDD( sc, ()=>
DriverManager.getConnection(url,user,pass),"select col1, col2,
col3..col 37 from schema.Table LIMIT ? OFFSET ?",100,0,*1*,(r:
ResultSet) => (r.getInt("col1"),r.getInt("col2")...r.getInt("col37")))
When I
On 24 September 2015 at 17:48, Deenar Toraskar <
deenar.toras...@thinkreactive.co.uk> wrote:
> you are interpreting the JDBCRDD API incorrectly. If you want to use
> partitions, then the column used to partition and present in the where
> clause must be numeric and the lower bound and upper bound
Which version of Spark you are using ?? I can get correct results using
JdbcRDD. Infact there is a test suite precisely for this (JdbcRDDSuite) .
I changed according to your input and got correct results from this test
suite.
On Wed, Sep 23, 2015 at 11:00 AM, satish chandra j
I am using Spark 1.5. I always get count = 100, irrespective of num
partitions.
On Wed, Sep 23, 2015 at 5:00 PM, satish chandra j
wrote:
> HI,
> Currently using Spark 1.2.2, could you please let me know correct results
> output count which you got it by using
HI,
Could anybody provide inputs if they have came across similar issue
@Rishitesh
Could you provide if any sample code to use JdbcRDDSuite
Regards,
Satish Chandra
On Wed, Sep 23, 2015 at 5:14 PM, Rishitesh Mishra
wrote:
> I am using Spark 1.5. I always get count =
HI,
Currently using Spark 1.2.2, could you please let me know correct results
output count which you got it by using JdbcRDDSuite
Regards,
Satish Chandra
On Wed, Sep 23, 2015 at 4:02 PM, Rishitesh Mishra
wrote:
> Which version of Spark you are using ?? I can get
Satish
Can you post the SQL query you are using?
The SQL query must have 2 placeholders and both of them should be an
inclusive range (<= and >=)..
e.g. select title, author from books where ? <= id and id <= ?
Are you doing this?
Deenar
On 23 September 2015 at 20:18, Deenar Toraskar <
Thanks Sujee :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-and-ClassTag-issue-tp18570p23912.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To
At the beginning of the code, do a query to find the current maximum ID
Don't just put in an arbitrarily large value, or all of your rows will end
up in 1 spark partition at the beginning of the range.
The question of keys is up to you... all that you need to be able to do is
write a sql
Yup, I did see that. Good point though, Cody. The mismatch was happening
for me when I was trying to get the 'new JdbcRDD' approach going. Once I
switched to the 'create' method things are working just fine. Was just able
to refactor the 'get connection' logic into a 'DbConnection implements
That's a good point, thanks. Is there a way to instrument continuous
realtime streaming of data out of a database? If the data keeps changing,
one way to implement extraction would be to keep track of something like
the last-modified timestamp and instrument the query to be 'where
lastmodified ?'
Thanks, Cody. Yes, I originally started off by looking at that but I get a
compile error if I try and use that approach: constructor JdbcRDD in class
JdbcRDDT cannot be applied to given types. Not to mention that
JavaJdbcRDDSuite somehow manages to not pass in the class tag (the last
argument).
Take a look at
https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java
On Wed, Feb 18, 2015 at 11:14 AM, dgoldenberg dgoldenberg...@gmail.com
wrote:
I'm reading data from a database using JdbcRDD, in Java, and I have an
implementation of
Is sc there a SparkContext or a JavaSparkContext? The compilation error
seems to indicate the former, but JdbcRDD.create expects the latter
On Wed, Feb 18, 2015 at 12:30 PM, Dmitry Goldenberg
dgoldenberg...@gmail.com wrote:
I have tried that as well, I get a compile error --
[ERROR]
That test I linked
https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java#L90
is calling a static method JdbcRDD.create, not new JdbcRDD. Is that what
you tried doing?
On Wed, Feb 18, 2015 at 12:00 PM, Dmitry Goldenberg
dgoldenberg...@gmail.com
I have tried that as well, I get a compile error --
[ERROR] ...SparkProto.java:[105,39] error: no suitable method found for
create(SparkContext,anonymous
ConnectionFactory,String,int,int,int,anonymous
FunctionResultSet,Integer)
The code is a copy and paste:
JavaRDDInteger jdbcRDD =
Cody, you were right, I had a copy and paste snag where I ended up with a
vanilla SparkContext rather than a Java one. I also had to *not* use my
function subclasses, rather just use anonymous inner classes for the
Function stuff and that got things working. I'm fully following
the JdbcRDD.create
Cant you implement the
org.apache.spark.api.java.function.Function
interface and pass an instance of that to JdbcRDD.create ?
On Wed, Feb 18, 2015 at 3:48 PM, Dmitry Goldenberg dgoldenberg...@gmail.com
wrote:
Cody, you were right, I had a copy and paste snag where I ended up with a
vanilla
That's exactly what I was doing. However, I ran into runtime issues with
doing that. For instance, I had a
public class DbConnection extends AbstractFunction0Connection
implements Serializable
I got a runtime error from Spark complaining that DbConnection wasn't an
instance of scala.Function0.
I'll add that there is a JDBC connector for the Spark SQL data sources API
in the works, and this will work with python (though the standard SchemaRDD
type conversions).
On Mon, Jan 5, 2015 at 7:09 AM, Cody Koeninger c...@koeninger.org wrote:
JavaDataBaseConnectivity is, as far as I know, JVM
JavaDataBaseConnectivity is, as far as I know, JVM specific. The JdbcRDD
is expecting to deal with Jdbc Connection and ResultSet objects.
I haven't done any python development in over a decade, but if someone
wants to work together on a python equivalent I'd be happy to help out.
The original
yeah.. i went through the source, and unless i'm missing something it's
not.. agreed, i'd love to see it implemented!
On Fri, Jan 2, 2015 at 3:59 PM, Tim Schweichler
tim.schweich...@healthination.com wrote:
Doesn't look like it is at the moment. If that's the case I'd love to
see it
Doesn't look like it is at the moment. If that's the case I'd love to see it
implemented.
From: elliott cordo elliottco...@gmail.commailto:elliottco...@gmail.com
Date: Friday, January 2, 2015 at 8:17 AM
To: user@spark.apache.orgmailto:user@spark.apache.org
Hi,
I encountered the same issue and solved it. Please check my blog post
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/
Thank you
--
View this message in context:
Hi,
I wrote a blog post about this.
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-tp19233p20939.html
Sent from the Apache Spark User List mailing list
I had also same problem to use JdbcRDD in java.
For me, I have written a class in scala to get JdbcRDD, and I call this
instance from java.
for instance, JdbcRDDWrapper.scala like this:
...
import java.sql._
import org.apache.spark.SparkContext
import org.apache.spark.rdd.JdbcRDD
import
%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
--
View this message in context: Re: JdbcRDD
http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-tp19233p19235.html
Sent from the Apache Spark User List mailing list archive
http://apache-spark-user-list
That declaration looks OK for Java 8, at least when I tried it just
now vs master. The only thing I see wrong here is getInt throws an
exception which means the lambda has to be more complicated than this.
This is Java code here calling the constructor so yes it can work fine
from Java (8).
On
https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/JdbcRDD.html#JdbcRDD(org.apache.spark.SparkContext,
scala.Function0, java.lang.String, long, long, int, scala.Function1,
scala.reflect.ClassTag)
I don't think there is a completely Java-friendly version of this
class. However you
33 matches
Mail list logo