, *even
though many of them are required.
I went hunting in the code, and I found that in JDBCRDD, when it resolves
the schema of a table, it passes in *alwaysNullable=true* to JdbcUtils,
which forces all columns to resolve as nullable.
https://github.com/apache/spark/blob/branch-2.3/sql/core/src
record))
.join(OracleDB_RDD))
.print();
Spark version 1.6, running in yarn cluster mode.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How
..@gmail.com> wrote:
> Hi Ninad, i believe the purpose of jdbcRDD is to use RDBMS as an addtional
> data source during the data processing, main goal of spark is still
> analyzing data from HDFS-like file system.
>
> to use spark as a data integration tool to transfer billion
Hi Ninad, i believe the purpose of jdbcRDD is to use RDBMS as an addtional
data source during the data processing, main goal of spark is still
analyzing data from HDFS-like file system.
to use spark as a data integration tool to transfer billions of records
from RDBMS to HDFS etc. could work
Hi Team,
One of my client teams is trying to see if they can use Spark to source
data from RDBMS instead of Sqoop. Data would be substantially large in the
order of billions of records.
I am not sure reading the documentations whether jdbcRDD by design is going
to be able to scale well
Hello Spark experts!
I am new to Spark and i have the following query...
What I am trying to do: Run a spark 1.5.1 job local[*] on a 4 core CPU.
This will ping oracle data base and fetch 5000 records each in jdbcRDD, I
increase the number of partitions by 1 for every 5000 records i fetch.
I
Hi Deenar,
Thanks for your valuable inputs
Here is a situation, if a Source Table does not have any such column(unique
values,numeric and sequential) which is suitable as Partition Column to be
specified for JDBCRDD Constructor or DataSource API.How to proceed further
on this scenario and also
n, if a Source Table does not have any such
> column(unique values,numeric and sequential) which is suitable as Partition
> Column to be specified for JDBCRDD Constructor or DataSource API.How to
> proceed further on this scenario and also let me know if any default
> approach Spark
HI All,
Please give me some inputs on *Partition Column *to be used in
DataSourceAPI or JDBCRDD to define Lowerbound and Upperbound value which
would be used to define No. of partitions, but issue is my source table does
not have a Numeric Columns which is sequential and unique such that proper
HI All,
Please provide your inputs on Partition Column to be used in DataSourceAPI
or JDBCRDD in a scenerio where the source table does not have a Numeric
Columns which is sequential and unique such that proper partitioning can
take place in Spark
Regards,
Satish
HI Deenar,
Please find the SQL query below:
var SQL_RDD= new JdbcRDD( sc, ()=>
DriverManager.getConnection(url,user,pass),"select col1, col2,
col3..col 37 from schema.Table LIMIT ? OFFSET ?",100,0,*1*,(r:
ResultSet) => (r.getInt("col1"),r.getInt("col2"
On 24 September 2015 at 17:48, Deenar Toraskar <
deenar.toras...@thinkreactive.co.uk> wrote:
> you are interpreting the JDBCRDD API incorrectly. If you want to use
> partitions, then the column used to partition and present in the where
> clause must be numeric and the lower bound
Which version of Spark you are using ?? I can get correct results using
JdbcRDD. Infact there is a test suite precisely for this (JdbcRDDSuite) .
I changed according to your input and got correct results from this test
suite.
On Wed, Sep 23, 2015 at 11:00 AM, satish chandra j <jsatishc
got it by using JdbcRDDSuite
>
> Regards,
> Satish Chandra
>
> On Wed, Sep 23, 2015 at 4:02 PM, Rishitesh Mishra <
> rishi80.mis...@gmail.com> wrote:
>
>> Which version of Spark you are using ?? I can get correct results using
>> JdbcRDD. Infact there is a test sui
t by using JdbcRDDSuite
>>
>> Regards,
>> Satish Chandra
>>
>> On Wed, Sep 23, 2015 at 4:02 PM, Rishitesh Mishra <
>> rishi80.mis...@gmail.com> wrote:
>>
>>> Which version of Spark you are using ?? I can get correct results using
>>> J
I can get correct results using
> JdbcRDD. Infact there is a test suite precisely for this (JdbcRDDSuite) .
> I changed according to your input and got correct results from this test
> suite.
>
> On Wed, Sep 23, 2015 at 11:00 AM, satish chandra j <
> jsatishchan...@gmail.com>
gt;>>
>>>> HI,
>>>> Currently using Spark 1.2.2, could you please let me know correct
>>>> results output count which you got it by using JdbcRDDSuite
>>>>
>>>> Regards,
>>>> Satish Chandra
>>>>
>>>
HI All,
JdbcRDD constructor has following parameters,
*JdbcRDD
<https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/rdd/JdbcRDD.html#JdbcRDD(org.apache.spark.SparkContext,
scala.Function0, java.lang.String, long, long, int, scala.Function1,
scala.reflect.ClassTag)>*(SparkContext
Hi,
While using reference with in JDBCRdd , It is throwing serializable exception.
Does JDBCRdd does not except reference from other part of code.?
confMap= ConfFactory.getConf(ParquetStreaming)
val jdbcRDD = new JdbcRDD(sc, () => {
Class.forN
Thanks Sujee :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-and-ClassTag-issue-tp18570p23912.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi all!
what is the most efficient way to convert jdbcRDD to DataFrame.
any example?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Converting-spark-JDBCRDD-to-DataFrame-tp23647.html
Sent from the Apache Spark User List mailing list archive
Use the built in JDBC data source:
https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
On Mon, Jul 6, 2015 at 6:42 AM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
Hi all!
what is the most efficient way to convert jdbcRDD to DataFrame.
any example
Hi,
what are the options in DataFrame/JdbcRdd save/saveAsTable api.
is there any options to override/update a particular column in the table
instead of whole table overriding based on some ID colum.
SaveMode append is there but it wont help us to update the record,it will
append/add new row
Hi Team,
in my usecase i need to sync the data with mssql for any operation in
mssql.but as per my spark knowledge we have JDBCRDD it will read data from
rdbms tables with upper and lower limits.
someone please help is there any API to sync data automatically from single
rdbms table for any DML
I would suggest looking at
https://github.com/datastax/spark-cassandra-connector
On Tue, Jun 16, 2015 at 4:01 AM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
hi all!
is there a way to connect cassandra with jdbcRDD ?
--
View this message in context:
http://apache-spark-user-list
hi all!
is there a way to connect cassandra with jdbcRDD ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/cassandra-with-jdbcRDD-tp23335.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi !
Can we load hana database table using spark jdbc RDD?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/sap-hana-database-laod-using-jdbcRDD-tp22726.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Have you already tried using the Vertica hadoop input format with spark? I
don't know how it's implemented, but I'd hope that it has some notion of
vertica-specific shard locality (which JdbcRDD does not).
If you're really constrained to consuming the result set in a single
thread, whatever
already tried using the Vertica hadoop input format with spark?
I don't know how it's implemented, but I'd hope that it has some notion of
vertica-specific shard locality (which JdbcRDD does not).
If you're really constrained to consuming the result set in a single
thread, whatever processing
, our
database doesn't support the above LIMIT syntax and we don't have a generic
way of partitioning the various queries.
As a result -- we stated by forking JDBCRDD and made a version that
executes the SQL query once in getPartitions into a Vector and then hands
each worker node an index
I'm a little confused by your comments regarding LIMIT. There's nothing
about JdbcRDD that depends on limit. You just need to be able to partition
your data in some way such that it has numeric upper and lower bounds.
Primary key range scans, not limit, would ordinarily be the best way to do
Jorn: Vertica
Cody: I posited the limit just as an example of how jdbcrdd could be used least
invasively. Let's say we used a partition on a time field -- we would still
need to have N executions of those queries. The queries we have are very
intense and concurrency is an issue even
approach?
--eric
On 3/1/15 4:28 AM, michal.klo...@gmail.com wrote:
Jorn: Vertica
Cody: I posited the limit just as an example of how jdbcrdd could be
used least invasively. Let's say we used a partition on a time field
-- we would still need to have N executions of those queries. The
queries
the query results into another table in
your database and then query that using the normal approach?
--eric
On 3/1/15 4:28 AM, michal.klo...@gmail.com wrote:
Jorn: Vertica
Cody: I posited the limit just as an example of how jdbcrdd could be used
least invasively. Let's say we used a partition
queries.
As a result -- we stated by forking JDBCRDD and made a version that
executes the SQL query once in getPartitions into a Vector and then hands
each worker node an index and iterator. Here's a snippet of getPartitions
and compute:
override def getPartitions: Array[Partition] = {
//Compute
when I was trying to get the 'new JdbcRDD' approach going. Once I
switched to the 'create' method things are working just fine. Was just able
to refactor the 'get connection' logic into a 'DbConnection implements
JdbcRDD.ConnectionFactory' and my 'map row' class is still 'MapRow
implements
Yup, I did see that. Good point though, Cody. The mismatch was happening
for me when I was trying to get the 'new JdbcRDD' approach going. Once I
switched to the 'create' method things are working just fine. Was just able
to refactor the 'get connection' logic into a 'DbConnection implements
. Of
course, a numeric primary key is going to be the most efficient way to do
that.
On Thu, Feb 19, 2015 at 8:57 AM, Dmitry Goldenberg
dgoldenberg...@gmail.com wrote:
Yup, I did see that. Good point though, Cody. The mismatch was happening
for me when I was trying to get the 'new JdbcRDD
Thanks, Cody. Yes, I originally started off by looking at that but I get a
compile error if I try and use that approach: constructor JdbcRDD in class
JdbcRDDT cannot be applied to given types. Not to mention that
JavaJdbcRDDSuite somehow manages to not pass in the class tag (the last
argument
Take a look at
https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java
On Wed, Feb 18, 2015 at 11:14 AM, dgoldenberg dgoldenberg...@gmail.com
wrote:
I'm reading data from a database using JdbcRDD, in Java, and I have an
implementation
] ...SparkProto.java:[105,39] error: no suitable method found for
create(SparkContext,anonymous
ConnectionFactory,String,int,int,int,anonymous
FunctionResultSet,Integer)
The code is a copy and paste:
JavaRDDInteger jdbcRDD = JdbcRDD.create(
sc,
new JdbcRDD.ConnectionFactory
That test I linked
https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java#L90
is calling a static method JdbcRDD.create, not new JdbcRDD. Is that what
you tried doing?
On Wed, Feb 18, 2015 at 12:00 PM, Dmitry Goldenberg
dgoldenberg...@gmail.com
I have tried that as well, I get a compile error --
[ERROR] ...SparkProto.java:[105,39] error: no suitable method found for
create(SparkContext,anonymous
ConnectionFactory,String,int,int,int,anonymous
FunctionResultSet,Integer)
The code is a copy and paste:
JavaRDDInteger jdbcRDD
error --
[ERROR] ...SparkProto.java:[105,39] error: no suitable method found for
create(SparkContext,anonymous
ConnectionFactory,String,int,int,int,anonymous
FunctionResultSet,Integer)
The code is a copy and paste:
JavaRDDInteger jdbcRDD = JdbcRDD.create(
sc,
new
for
create(SparkContext,anonymous
ConnectionFactory,String,int,int,int,anonymous
FunctionResultSet,Integer)
The code is a copy and paste:
JavaRDDInteger jdbcRDD = JdbcRDD.create(
sc,
new JdbcRDD.ConnectionFactory() {
public Connection getConnection
,String,int,int,int,anonymous
FunctionResultSet,Integer)
The code is a copy and paste:
JavaRDDInteger jdbcRDD = JdbcRDD.create(
sc,
new JdbcRDD.ConnectionFactory() {
public Connection getConnection() throws SQLException {
return
specific. The JdbcRDD
is expecting to deal with Jdbc Connection and ResultSet objects.
I haven't done any python development in over a decade, but if someone
wants to work together on a python equivalent I'd be happy to help out.
The original JdbcRDDimplementation only took a little bit
JavaDataBaseConnectivity is, as far as I know, JVM specific. The JdbcRDD
is expecting to deal with Jdbc Connection and ResultSet objects.
I haven't done any python development in over a decade, but if someone
wants to work together on a python equivalent I'd be happy to help out.
The original
Hi All -
Is JdbcRdd currently supported? Having trouble finding any info or
examples?
it implemented.
From: elliott cordo elliottco...@gmail.com
Date: Friday, January 2, 2015 at 8:17 AM
To: user@spark.apache.org user@spark.apache.org
Subject: JdbcRdd for Python
Hi All -
Is JdbcRdd currently supported? Having trouble finding any info or
examples?
@spark.apache.org
Subject: JdbcRdd for Python
Hi All -
Is JdbcRdd currently supported? Having trouble finding any info or examples?
Hi,
I encountered the same issue and solved it. Please check my blog post
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/
Thank you
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-and-ClassTag-issue
Hi,
I wrote a blog post about this.
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-tp19233p20939.html
Sent from the Apache Spark User List mailing list
Hi All,
am new to Spark. I tried to connect to Mysql using Spark. want to write
a code in Java but
getting runtime exception. I guess that the issue is with the function0
and function1 objects being passed in JDBCRDD .
I tried my level best and attached the code, can you please help us
All,
am new to Spark. I tried to connect to Mysql using Spark. want to write
a code in Java but
getting runtime exception. I guess that the issue is with the function0
and function1 objects being passed in JDBCRDD .
I tried my level best and attached the code, can you please help us to fix
Thanks Akhil but it is expecting Function1 instead of Function .. I tried
out writing a new class by implementing Function1 but
got an error . can you please help us to get it resolved
JDBCRDD is created as
JdbcRDD rdd = new JdbcRDD(sc, getConnection, sql, 0, 0, 1,
getResultset
Try changing this line
*JdbcRDD* rdd = *new* *JdbcRDD*(sc, getConnection, sql, 0, 0, 1,
getResultset, ClassTag$.*MODULE$*.apply(String.*class*));
to
*JdbcRDD* rdd = *new* *JdbcRDD*(sc, getConnection, sql, 0, 100, 1,
getResultset, ClassTag$.*MODULE$*.apply(String
@spark.apache.org
Date: 12/10/2014 12:55 PM
Subject:Re: reg JDBCRDD code
Try changing this line
JdbcRDD rdd = new JdbcRDD(sc, getConnection, sql, 0, 0, 1,
getResultset, ClassTag$.MODULE$.apply(String.class));
to
JdbcRDD rdd = new JdbcRDD(sc, getConnection, sql, 0, 100
Hi,
Are there any examples of using JdbcRDD in java available?
Its not clear what is the last argument in this example (
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/rdd/JdbcRDDSuite.scala
):
sc = new SparkContext(local, test)
val rdd = new JdbcRDD(
sc
I had also same problem to use JdbcRDD in java.
For me, I have written a class in scala to get JdbcRDD, and I call this
instance from java.
for instance, JdbcRDDWrapper.scala like this:
...
import java.sql._
import org.apache.spark.SparkContext
import org.apache.spark.rdd.JdbcRDD
import
Thanks Kidong. I'll try your approach.
On Tue, Nov 18, 2014 at 4:22 PM, mykidong mykid...@gmail.com wrote:
I had also same problem to use JdbcRDD in java.
For me, I have written a class in scala to get JdbcRDD, and I call this
instance from java.
for instance, JdbcRDDWrapper.scala like
Hi All,
I am trying to access SQL Server through JdbcRDD. But getting error on
ClassTag place holder.
Here is the code which I wrote
public void readFromDB() {
String sql = Select * from Table_1 where values = ? and
values
);
try
{
Class.forName(com.mysql.jdbc.Driver);
}
catch(Exception ex)
{
ex.printStackTrace();
System.exit(1);
}
JdbcRDD rdd=new JdbcRDD(sctx,new Z(),SELECT * FROM spark WHERE
{
Class.forName(com.mysql.jdbc.Driver);
}
catch(Exception ex)
{
System.exit(1);
}
Connection
zconn=DriverManager.getConnection(jdbc:mysql://localhost:3306/?user=azkabanpassword=password);
JdbcRDD rdd=new JdbcRDD(sctx,new Z
The following line of code is indicating the constructor is not defined. The
only examples I can find of usage of JdbcRDD is Scala examples. Does this work
in Java? Is there any examples? Thanks.
JdbcRDDInteger rdd = new JdbcRDDInteger(sp, () -
ods.getConnection(), sql
).
On Tue, Oct 28, 2014 at 5:37 PM, Ron Ayoub ronalday...@live.com wrote:
The following line of code is indicating the constructor is not defined. The
only examples I can find of usage of JdbcRDD is Scala examples. Does this
work in Java? Is there any examples? Thanks.
JdbcRDDInteger rdd = new
hi,
is there a simple example for jdbcRDD from JAVA and not scala,
trying to figure out the last parameter in the constructor of jdbcRDD
thanks
https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/JdbcRDD.html#JdbcRDD(org.apache.spark.SparkContext,
scala.Function0, java.lang.String, long, long, int, scala.Function1,
scala.reflect.ClassTag)
I don't think there is a completely Java-friendly version of this
class. However you
Hi Thanks Alli have few more questions on this
suppose i don't want to pass where caluse in my sql and is their a way that
i can do this.
Right now i am trying to modify JdbcRDD class by removing all the paramaters
for lower bound and upper bound. But i am getting run time exceptions
Kc
On Jul 30, 2014 3:55 PM, srinivas kusamsrini...@gmail.com wrote:
Hi,
I am trying to get data from mysql using JdbcRDD using code The table have
three columns
val url = jdbc:mysql://localhost:3306/studentdata
val username = root
val password = root
val mysqlrdd = new
Hi Srini,
I believe the JdbcRDD requires input splits based on ranges within the
query itself. As an example, you could adjust your query to something like:
SELECT * FROM student_info WHERE id = ? AND id = ?
Note that the values you've passed in '1, 20, 2' correspond to the lower
bound index
Hi Guys,
Any simplistic example for JDBCRDD for a newbie?
--
Ahmed Osama Ibrahim
ITSC International Technology Services Corporation
www.itscorpmd.com
Tel: +1 240 685 1444
Fax: +1 240 668 9841
72 matches
Mail list logo