Re: welcome a new batch of committers

2018-10-05 Thread Suresh Thalamati
Congratulations to all!

-suresh

On Wed, Oct 3, 2018 at 1:59 AM Reynold Xin  wrote:

> Hi all,
>
> The Apache Spark PMC has recently voted to add several new committers to
> the project, for their contributions:
>
> - Shane Knapp (contributor to infra)
> - Dongjoon Hyun (contributor to ORC support and other parts of Spark)
> - Kazuaki Ishizaki (contributor to Spark SQL)
> - Xingbo Jiang (contributor to Spark Core and SQL)
> - Yinan Li (contributor to Spark on Kubernetes)
> - Takeshi Yamamuro (contributor to Spark SQL)
>
> Please join me in welcoming them!
>
>


Re: Welcoming Tejas Patil as a Spark committer

2017-10-03 Thread Suresh Thalamati
Congratulations , Tejas!

-suresh

> On Sep 29, 2017, at 12:58 PM, Matei Zaharia  wrote:
> 
> Hi all,
> 
> The Spark PMC recently added Tejas Patil as a committer on the
> project. Tejas has been contributing across several areas of Spark for
> a while, focusing especially on scalability issues and SQL. Please
> join me in welcoming Tejas!
> 
> Matei
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Continue reading dataframe from file despite errors

2017-09-12 Thread Suresh Thalamati
Try the CSV   Option(“mode”,  "dropmalformed”), that might skip the error 
records. 


> On Sep 12, 2017, at 2:33 PM, jeff saremi  wrote:
> 
> should have added some of the exception to be clear:
> 
> 17/09/12 14:14:17 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; 
> aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 
> (TID 15, localhost, executor driver): java.lang.NumberFormatException: For 
> input string: "south carolina"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:580)
> at java.lang.Integer.parseInt(Integer.java:615)
> at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
> at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:250)
> 
> From: jeff saremi 
> Sent: Tuesday, September 12, 2017 2:32:03 PM
> To: user@spark.apache.org
> Subject: Continue reading dataframe from file despite errors
>  
> I'm using a statement like the following to load my dataframe from some text 
> file
> Upon encountering the first error, the whole thing throws an exception and 
> processing stops.
> I'd like to continue loading even if that results in zero rows in my 
> dataframe. How can i do that?
> thanks
> 
> spark.read.schema(SomeSchema).option("sep", 
> "\t").format("csv").load("somepath")



Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2 read path

2017-09-06 Thread Suresh Thalamati
+1 (non-binding)


> On Sep 6, 2017, at 7:29 PM, Wenchen Fan  wrote:
> 
> Hi all,
> 
> In the previous discussion, we decided to split the read and write path of 
> data source v2 into 2 SPIPs, and I'm sending this email to call a vote for 
> Data Source V2 read path only.
> 
> The full document of the Data Source API V2 is:
> https://docs.google.com/document/d/1n_vUVbF4KD3gxTmkNEon5qdQ-Z8qU5Frf6WMQZ6jJVM/edit
>  
> 
> 
> The ready-for-review PR that implements the basic infrastructure for the read 
> path is:
> https://github.com/apache/spark/pull/19136 
> 
> 
> The vote will be up for the next 72 hours. Please reply with your vote:
> 
> +1: Yeah, let's go forward and implement the SPIP.
> +0: Don't really care.
> -1: I don't think this is a good idea because of the following technical 
> reasons.
> 
> Thanks!



Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Suresh Thalamati
Congratulations, Jerry

> On Aug 28, 2017, at 6:28 PM, Matei Zaharia  wrote:
> 
> Hi everyone,
> 
> The PMC recently voted to add Saisai (Jerry) Shao as a committer. Saisai has 
> been contributing to many areas of the project for a long time, so it’s great 
> to see him join. Join me in thanking and congratulating him!
> 
> Matei
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[jira] [Created] (SPARK-21824) DROP TABLE should automatically drop any dependent referential constraints or raise error.

2017-08-24 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-21824:


 Summary: DROP TABLE should  automatically  drop any dependent 
referential constraints or raise error.
 Key: SPARK-21824
 URL: https://issues.apache.org/jira/browse/SPARK-21824
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Suresh Thalamati


DROP TABLE should  raise error if there are any  dependent referential 
constraints  unless user specifies CASCADE CONSTRAINTS

Syntax :
{code:sql}
DROP TABLE  [CASCADE CONSTRAINTS]
{code}

Hive drops the referential constraints automatically. Oracle requires user 
specify _CASCADE CONSTRAINTS_ clause to automatically drop the referential 
constraints, otherwise raises the error. Should we stick to the *Hive behavior* 
?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21823) ALTER TABLE table statements such as RENAME and CHANGE columns should raise error if there are any dependent constraints.

2017-08-24 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-21823:


 Summary: ALTER TABLE table statements  such as RENAME and CHANGE 
columns should  raise  error if there are any dependent constraints. 
 Key: SPARK-21823
 URL: https://issues.apache.org/jira/browse/SPARK-21823
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Suresh Thalamati


Following ALTER TABLE DDL statements will impact  the  informational 
constraints defined on a table:

{code:sql}
ALTER TABLE name RENAME TO new_name
ALTER TABLE name CHANGE column_name new_name new_type
{code}

Spark SQL should raise errors if there are 
 informational constraints defined 
on the columns  affected by the ALTER  and let the user drop constraints before 
proceeding with the DDL. In the future we can enhance the  ALTER  to 
automatically fix up the constraint definition in the catalog when possible, 
and not raise error

When spark adds support for DROP/REPLACE of columns they will impact 
informational constraints.
{code:sql}
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21784) Add ALTER TABLE ADD CONSTRANT DDL to support defining primary key and foreign keys

2017-08-18 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131900#comment-16131900
 ] 

Suresh Thalamati commented on SPARK-21784:
--

I am working on implementation of this task. 

> Add ALTER TABLE ADD CONSTRANT DDL to support defining primary key and foreign 
> keys
> --
>
> Key: SPARK-21784
> URL: https://issues.apache.org/jira/browse/SPARK-21784
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>    Reporter: Suresh Thalamati
>
> Currently Spark SQL does not have  DDL support to define primary key , and 
> foreign key constraints. This Jira is to add DDL support to define primary 
> key and foreign key informational constraint using ALTER TABLE syntax. These 
> constraints will be used in query optimization and you can find more details 
> about this in the spec in SPARK-19842
> *Syntax :*
> {code}
> ALTER TABLE [db_name.]table_name ADD [CONSTRAINT constraintName]
>   (PRIMARY KEY (col_names) |
>   FOREIGN KEY (col_names) REFERENCES [db_name.]table_name [(col_names)])
>   [VALIDATE | NOVALIDATE] [RELY | NORELY]
> {code}
> Examples :
> {code:sql}
> ALTER TABLE employee _ADD CONSTRANT pk_ PRIMARY KEY(empno) VALIDATE RELY
> ALTER TABLE department _ADD CONSTRAINT emp_fk_ FOREIGN KEY (mgrno) REFERENCES 
> employee(empno) NOVALIDATE NORELY
> {code}
> *Constraint name generated by the system:*
> {code:sql}
> ALTER TABLE department ADD PRIMARY KEY(deptno) VALIDATE RELY
> ALTER TABLE employee ADD FOREIGN KEY (workdept) REFERENCES department(deptno) 
> VALIDATE RELY;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21784) Add ALTER TABLE ADD CONSTRANT DDL to support defining primary key and foreign keys

2017-08-18 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-21784:


 Summary: Add ALTER TABLE ADD CONSTRANT DDL to support defining 
primary key and foreign keys
 Key: SPARK-21784
 URL: https://issues.apache.org/jira/browse/SPARK-21784
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Suresh Thalamati


Currently Spark SQL does not have  DDL support to define primary key , and 
foreign key constraints. This Jira is to add DDL support to define primary key 
and foreign key informational constraint using ALTER TABLE syntax. These 
constraints will be used in query optimization and you can find more details 
about this in the spec in SPARK-19842

*Syntax :*
{code}
ALTER TABLE [db_name.]table_name ADD [CONSTRAINT constraintName]
  (PRIMARY KEY (col_names) |
  FOREIGN KEY (col_names) REFERENCES [db_name.]table_name [(col_names)])
  [VALIDATE | NOVALIDATE] [RELY | NORELY]
{code}
Examples :
{code:sql}
ALTER TABLE employee _ADD CONSTRANT pk_ PRIMARY KEY(empno) VALIDATE RELY
ALTER TABLE department _ADD CONSTRAINT emp_fk_ FOREIGN KEY (mgrno) REFERENCES 
employee(empno) NOVALIDATE NORELY
{code}

*Constraint name generated by the system:*
{code:sql}
ALTER TABLE department ADD PRIMARY KEY(deptno) VALIDATE RELY
ALTER TABLE employee ADD FOREIGN KEY (workdept) REFERENCES department(deptno) 
VALIDATE RELY;
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Dataset count on database or parquet

2017-02-09 Thread Suresh Thalamati
If you have to get the data into parquet format for other reasons   then I 
think count() on the parquet should be better.  If it just the count  you need 
using database  sending dbTable = (select count(*) from  ) might be 
quicker,  t will avoid unnecessary data transfer from the database to spark.


Hope that helps
-suresh

> On Feb 8, 2017, at 2:58 AM, Rohit Verma  wrote:
> 
> Hi Which of the following is better approach for too many values in database
> 
>   final Dataset dataset = spark.sqlContext().read()
> .format("jdbc")
> .option("url", params.getJdbcUrl())
> .option("driver", params.getDriver())
> .option("dbtable", params.getSqlQuery())
> //.option("partitionColumn", hashFunction)
> //.option("lowerBound", 0)
> //.option("upperBound", 10)
> //.option("numPartitions", 10)
> //.option("oracle.jdbc.timezoneAsRegion", "false")
> .option("fetchSize", 10)
> .load();
> dataset.write().parquet(params.getPath());
> 
> // target is to get count of persisted rows.
> 
> 
> // approach 1 i.e getting count directly from dataset
> // as I understood this count will be transalted to jdbcRdd.count and 
> could be on database
> long count = dataset.count();
> //approach 2 i.e read back saved parquet and get count from it. 
> long count = spark.read().parquet(params.getPath()).count();
> 
> 
> Regards
> Rohit



Re: Dataframe fails to save to MySQL table in spark app, but succeeds in spark shell

2017-01-26 Thread Suresh Thalamati
I notice columns are quoted wit double quotes in the error message 
('"user","age","state”)) . By chance did you override the MySQL JDBC dialect,  
default MySQL identifiers are quoted with `
override def quoteIdentifier(colName: String): String = {
  s"`$colName`"
}
Just wondering if the error you are running into is related to quotes. 

Thanks
-suresh


> On Jan 26, 2017, at 1:28 AM, Didac Gil  wrote:
> 
> Are you sure that “age” is a numeric field?
> 
> Even numeric, you could pass the “44” between quotes: 
> 
> INSERT into your_table ("user","age","state") VALUES ('user3’,’44','CT’)
> 
> Are you sure there are no more fields that are specified as NOT NULL, and 
> that you did not provide a value (besides user, age and state)?
> 
> 
>> On 26 Jan 2017, at 04:42, Xuan Dzung Doan  
>> wrote:
>> 
>> Hi,
>> 
>> Spark version 2.1.0
>> MySQL community server version 5.7.17
>> MySQL Connector Java 5.1.40
>> 
>> I need to save a dataframe to a MySQL table. In spark shell, the following 
>> statement succeeds:
>> 
>> scala> df.write.mode(SaveMode.Append).format("jdbc").option("url", 
>> "jdbc:mysql://127.0.0.1:3306/mydb").option("dbtable", 
>> "person").option("user", "username").option("password", "password").save()
>> 
>> I write an app that basically does the same thing, issuing the same 
>> statement saving the same dataframe to the same MySQL table. I run it using 
>> spark-submit, but it fails, reporting some error in the SQL syntax. Here's 
>> the detailed stack trace:
>> 
>> 17/01/25 16:06:02 INFO DAGScheduler: Job 2 failed: save at 
>> DataIngestionJob.scala:119, took 0.159574 s
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
>> to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: 
>> Lost task 0.0 in stage 2.0 (TID 3, localhost, executor driver): 
>> java.sql.BatchUpdateException: You have an error in your SQL syntax; check 
>> the manual that corresponds to your MySQL server version for the right 
>> syntax to use near '"user","age","state") VALUES ('user3',44,'CT')' at line 1
>>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>  at 
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>  at 
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>>  at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>>  at com.mysql.jdbc.Util.getInstance(Util.java:408)
>>  at 
>> com.mysql.jdbc.SQLError.createBatchUpdateException(SQLError.java:1162)
>>  at 
>> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1773)
>>  at 
>> com.mysql.jdbc.PreparedStatement.executeBatchInternal(PreparedStatement.java:1257)
>>  at com.mysql.jdbc.StatementImpl.executeBatch(StatementImpl.java:958)
>>  at 
>> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:597)
>>  at 
>> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670)
>>  at 
>> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670)
>>  at 
>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
>>  at 
>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
>>  at 
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
>>  at 
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
>>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>  at java.lang.Thread.run(Thread.java:745)
>> Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You 
>> have an error in your SQL syntax; check the manual that corresponds to your 
>> MySQL server version for the right syntax to use near '"user","age","state") 
>> VALUES ('user3',44,'CT')' at line 1
>>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>  at 
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>  at 
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>>  at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>>  at com.mysql.jdbc.Util.getInstance(Util.java:408)
>>  at 

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Suresh Thalamati
Congratulations Burak and Holden!

-suresh

> On Jan 24, 2017, at 10:13 AM, Reynold Xin  wrote:
> 
> Hi all,
> 
> Burak and Holden have recently been elected as Apache Spark committers.
> 
> Burak has been very active in a large number of areas in Spark, including 
> linear algebra, stats/maths functions in DataFrames, Python/R APIs for 
> DataFrames, dstream, and most recently Structured Streaming.
> 
> Holden has been a long time Spark contributor and evangelist. She has written 
> a few books on Spark, as well as frequent contributions to the Python API to 
> improve its usability and performance.
> 
> Please join me in welcoming the two!
> 
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[jira] [Commented] (SPARK-19318) Docker test case failure: `SPARK-16625: General data types to be mapped to Oracle`

2017-01-20 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832659#comment-15832659
 ] 

Suresh Thalamati commented on SPARK-19318:
--

I am looking into this test failure. 

> Docker test case failure: `SPARK-16625: General data types to be mapped to 
> Oracle`
> --
>
> Key: SPARK-19318
> URL: https://issues.apache.org/jira/browse/SPARK-19318
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>
> = FINISHED o.a.s.sql.jdbc.OracleIntegrationSuite: 'SPARK-16625: General 
> data types to be mapped to Oracle' =
> - SPARK-16625: General data types to be mapped to Oracle *** FAILED ***
>   types.apply(9).equals("class java.sql.Date") was false 
> (OracleIntegrationSuite.scala:136)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2016-11-09 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651172#comment-15651172
 ] 

Suresh Thalamati commented on SPARK-17916:
--

@Eric Liang   If it is possible , can you please share the data, and expected 
output with the options 

I am trying to fix this issue in PR ; https://github.com/apache/spark/pull/12904



> CSV data source treats empty string as null no matter what nullValue option is
> --
>
> Key: SPARK-17916
> URL: https://issues.apache.org/jira/browse/SPARK-17916
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Hossein Falaki
>
> When user configures {{nullValue}} in CSV data source, in addition to those 
> values, all empty string values are also converted to null.
> {code}
> data:
> col1,col2
> 1,"-"
> 2,""
> {code}
> {code}
> spark.read.format("csv").option("nullValue", "-")
> {code}
> We will find a null in both rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18141) jdbc datasource read fails when quoted columns (eg:mixed case, reserved words) in source table are used in the filter.

2016-10-27 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-18141:


 Summary: jdbc datasource read fails when  quoted  columns 
(eg:mixed case, reserved words) in source table are used  in the filter.
 Key: SPARK-18141
 URL: https://issues.apache.org/jira/browse/SPARK-18141
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.1, 2.0.0
Reporter: Suresh Thalamati


create table t1("Name" text, "Id" integer)
insert into t1 values('Mike', 1)

val df = sqlContext.read.jdbc(jdbcUrl, "t1", new Properties)

df.filter("Id = 1").show()

df.filter("`Id` = 1").show()


Error :
Cause: org.postgresql.util.PSQLException: ERROR: column "id" does not exist
  Position: 35
  at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2182)
  at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1911)
  at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:173)
  at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:622)
  at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:472)
  at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:386)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:295)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

I am working on fix for this issue, will submit PR soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2016-10-20 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593119#comment-15593119
 ] 

Suresh Thalamati commented on SPARK-17916:
--

Thank you for trying out the different scenarios. I think output you are 
getting after setting he quote to empty is not what is expected in the case. 
You want "" to be recognized as empty string, not actual quotes in the output.

Example (Before my changes on 2.0.1 branch):

input:
col1,col2
1,"-"
2,""
3,
4,"A,B"

val df = spark.read.format("csv").option("nullValue", "\"-\"").option("quote", 
"").option("header", true).load("/Users/suresht/sparktests/emptystring.csv")
df: org.apache.spark.sql.DataFrame = [col1: string, col2: string]

scala> df.selectExpr("length(col2)").show
++
|length(col2)|
++
|null|
|   2|
|null|
|   2|
++


scala> df.show
+++
|col1|col2|
+++
|   1|null|
|   2|  ""|
|   3|null|
|   4|  "A|
+++





> CSV data source treats empty string as null no matter what nullValue option is
> --
>
> Key: SPARK-17916
> URL: https://issues.apache.org/jira/browse/SPARK-17916
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Hossein Falaki
>
> When user configures {{nullValue}} in CSV data source, in addition to those 
> values, all empty string values are also converted to null.
> {code}
> data:
> col1,col2
> 1,"-"
> 2,""
> {code}
> {code}
> spark.read.format("csv").option("nullValue", "-")
> {code}
> We will find a null in both rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: welcoming Xiao Li as a committer

2016-10-04 Thread Suresh Thalamati
Congratulations, Xiao!



> On Oct 3, 2016, at 10:46 PM, Reynold Xin  wrote:
> 
> Hi all,
> 
> Xiao Li, aka gatorsmile, has recently been elected as an Apache Spark 
> committer. Xiao has been a super active contributor to Spark SQL. Congrats 
> and welcome, Xiao!
> 
> - Reynold
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Deep learning libraries for scala

2016-09-30 Thread Suresh Thalamati
Tensor frames

https://spark-packages.org/package/databricks/tensorframes 


Hope that helps
-suresh

> On Sep 30, 2016, at 8:00 PM, janardhan shetty  wrote:
> 
> Looking for scala dataframes in particular ?
> 
> On Fri, Sep 30, 2016 at 7:46 PM, Gavin Yue  > wrote:
> Skymind you could try. It is java
> 
> I never test though.
> 
> > On Sep 30, 2016, at 7:30 PM, janardhan shetty  > > wrote:
> >
> > Hi,
> >
> > Are there any good libraries which can be used for scala deep learning 
> > models ?
> > How can we integrate tensorflow with scala ML ?
> 



Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-27 Thread Suresh Thalamati

+1 (non-binding)

-suresh


> On Sep 26, 2016, at 11:11 PM, Jagadeesan As  wrote:
> 
> +1 (non binding)
>  
> Cheers,
> Jagadeesan A S
> 
> 
> 
> 
> From:Jean-Baptiste Onofré 
> To:dev@spark.apache.org
> Date:27-09-16 11:27 AM
> Subject:Re: [VOTE] Release Apache Spark 2.0.1 (RC3)
> 
> 
> 
> +1 (non binding)
> 
> Regards
> JB
> 
> On 09/27/2016 07:51 AM, Hyukjin Kwon wrote:
> > +1 (non-binding)
> >
> > 2016-09-27 13:22 GMT+09:00 Denny Lee  > >>:
> >
> > +1 on testing with Python2.
> >
> >
> > On Mon, Sep 26, 2016 at 3:13 PM Krishna Sankar  > >> wrote:
> >
> > I do run both Python and Scala. But via iPython/Python2 with my
> > own test code. Not running the tests from the distribution.
> > Cheers
> > 
> >
> > On Mon, Sep 26, 2016 at 11:59 AM, Holden Karau
> >  > >> wrote:
> >
> > I'm seeing some test failures with Python 3 that could
> > definitely be environmental (going to rebuild my virtual env
> > and double check), I'm just wondering if other people are
> > also running the Python tests on this release or if everyone
> > is focused on the Scala tests?
> >
> > On Mon, Sep 26, 2016 at 11:48 AM, Maciej Bryński
> >  > >> wrote:
> >
> > +1
> > At last :)
> >
> > 2016-09-26 19:56 GMT+02:00 Sameer Agarwal
> >  > >>:
> >
> > +1 (non-binding)
> >
> > On Mon, Sep 26, 2016 at 9:54 AM, Davies Liu
> >  >  > >> wrote:
> >
> > +1 (non-binding)
> >
> > On Mon, Sep 26, 2016 at 9:36 AM, Joseph Bradley
> >  >  > >> wrote:
> > > +1
> > >
> > > On Mon, Sep 26, 2016 at 7:47 AM, Denny Lee
> >  >  > >> wrote:
> > >>
> > >> +1 (non-binding)
> > >> On Sun, Sep 25, 2016 at 23:20 Jeff Zhang
> >  > >> wrote:
> > >>>
> > >>> +1
> > >>>
> > >>> On Mon, Sep 26, 2016 at 2:03 PM,
> > Shixiong(Ryan) Zhu
> > >>>  >  > >> wrote:
> > 
> >  +1
> > 
> >  On Sun, Sep 25, 2016 at 10:43 PM, Pete Lee
> >  >  > >>
> >  wrote:
> > >
> > > +1
> > >
> > >
> > > On Sun, Sep 25, 2016 at 3:26 PM, Herman
> > van Hövell tot Westerflier
> > >  >  > >> wrote:
> > >>
> > >> +1 (non-binding)
> > >>
> > >> On Sun, Sep 25, 2016 at 2:05 PM, Ricardo
> > Almeida
> > >>  >  > >> wrote:
> > >>>
> > >>> +1 (non-binding)
> > >>>
> > >>> Built and tested on
> > >>> - Ubuntu 16.04 / OpenJDK 1.8.0_91
> > >>> - CentOS / Oracle Java 1.7.0_55
> > >>> (-Phadoop-2.7 -Dhadoop.version=2.7.3
> 

[jira] [Commented] (SPARK-14536) NPE in JDBCRDD when array column contains nulls (postgresql)

2016-09-22 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514246#comment-15514246
 ] 

Suresh Thalamati commented on SPARK-14536:
--

Yes. SPARK-10186 already added array support for postgres ,  the PR 
(https://github.com/apache/spark/pull/15192) i submitted in this Jira will 
address NPE issue for null values.

SPARK-8500 from tiltle (Support for array types in JDBCRDD) ,  sounds more 
generic than specific to postgres , although repo given is for postgres.  

> NPE in JDBCRDD when array column contains nulls (postgresql)
> 
>
> Key: SPARK-14536
> URL: https://issues.apache.org/jira/browse/SPARK-14536
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Jeremy Smith
>  Labels: NullPointerException
>
> At 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L453
>  it is assumed that the JDBC driver will definitely return a non-null `Array` 
> object from the call to `getArray`, and that in the event of a null array it 
> will return an non-null `Array` object with a null underlying array.  But as 
> you can see here 
> https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgResultSet.java#L387
>  that isn't the case, at least for PostgreSQL.  This causes a 
> `NullPointerException` whenever an array column contains null values. It 
> seems like the PostgreSQL JDBC driver is probably doing the wrong thing, but 
> even so there should be a null check in JDBCRDD.  I'm happy to submit a PR if 
> that would be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14536) NPE in JDBCRDD when array column contains nulls (postgresql)

2016-09-22 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514129#comment-15514129
 ] 

Suresh Thalamati commented on SPARK-14536:
--

[~sowen]  I am not sure why this issue got closed  as duplicate after I 
reopened. Based on the test case I tried on the master ,  this issue does not 
looks like a duplicate to me; as I mentioned in my previous comment when i 
reopened the issue.  array data type is supported for postgres.   

Repro :
On postgresdb :
create table spark_array(a int , b text[])
insert into spark_array values(1 , null)
insert into spark_array values(1 , '{"AA", "BB"}')

val psqlProps = new java.util.Properties()
psqlProps.setProperty("user" , "user")
psqlProps.setProperty("password" , "password")

-- works fine
spark.read.jdbc("jdbc:postgresql://localhost:5432/pdb", "(select * from 
spark_array where b is not null) as a ", psqlProps).show() 

-- fails with error.
spark.read.jdbc("jdbc:postgresql://localhost:5432/pdb", "spark_array", 
psqlProps).show()   fails with following error:

Stack :
16/09/21 11:49:41 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
localhost): java.lang.NullPointerException
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:442)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:440)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:301)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:283)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)


> NPE in JDBCRDD when array column contains nulls (postgresql)
> 
>
> Key: SPARK-14536
> URL: https://issues.apache.org/jira/browse/SPARK-14536
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Jeremy Smith
>  Labels: NullPointerException
>
> At 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L453
>  it is assumed that the JDBC driver will definitely return a non-null `Array` 
> object from the call to `getArray`, and that in the event of a null array it 
> will return an non-null `Array` object with a null underlying array.  But as 
> you can see here 
> https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgResultSet.java#L387
>  that isn't the case, at least for PostgreSQL.  This causes a 
> `NullPointerException` whenever an array column contains null values. It 
> seems like the PostgreSQL JDBC driver is probably doing the wrong thing, but 
> even so there should be a null check in JDBCRDD.  I'm happy to submit a PR if 
> that would be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-14536) NPE in JDBCRDD when array column contains nulls (postgresql)

2016-09-21 Thread Suresh Thalamati (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Thalamati reopened SPARK-14536:
--

SPARK-10186  added array data type support  for postgres in 1.6.  NPE issues 
still exists. I was able repro in the  master. 

> NPE in JDBCRDD when array column contains nulls (postgresql)
> 
>
> Key: SPARK-14536
> URL: https://issues.apache.org/jira/browse/SPARK-14536
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Jeremy Smith
>  Labels: NullPointerException
>
> At 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L453
>  it is assumed that the JDBC driver will definitely return a non-null `Array` 
> object from the call to `getArray`, and that in the event of a null array it 
> will return an non-null `Array` object with a null underlying array.  But as 
> you can see here 
> https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgResultSet.java#L387
>  that isn't the case, at least for PostgreSQL.  This causes a 
> `NullPointerException` whenever an array column contains null values. It 
> seems like the PostgreSQL JDBC driver is probably doing the wrong thing, but 
> even so there should be a null check in JDBCRDD.  I'm happy to submit a PR if 
> that would be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Spark_JDBC_Partitions

2016-09-13 Thread Suresh Thalamati
There is also another  jdbc method in  data frame  reader api o specify your 
own predicates for  each partition. Using this you can control what is included 
in  each partition.

val jdbcPartitionWhereClause = Array[String]("id < 100" , "id >=100 and id < 
200")
val df = spark.read.jdbc(
  urlWithUserAndPass,
  "TEST.PEOPLE",
  predicates = jdbcPartitionWhereClause,
  new Properties())


Hope that helps. 
-suresh


> On Sep 13, 2016, at 9:44 AM, Rabin Banerjee  
> wrote:
> 
> Trust me, Only thing that can help you in your situation is SQOOP oracle 
> direct connector which is known as  ORAOOP. Spark cannot do everything , 
> you need a OOZIE workflow which will trigger sqoop job with oracle direct 
> connector to pull the data then spark batch to process .
> 
> Hope it helps !!
> 
> On Tue, Sep 13, 2016 at 6:10 PM, Igor Racic  > wrote:
> Hi, 
> 
> One way can be to use NTILE function to partition data. 
> Example:
> 
> REM Creating test table
> create table Test_part as select * from ( select rownum rn from all_tables t1 
> ) where rn <= 1000;
> 
> REM Partition lines by Oracle block number, 11 partitions in this example. 
> select ntile(11) over( order by dbms_rowid.ROWID_BLOCK_NUMBER( rowid ) ) nt 
> from Test_part
> 
> 
> Let's see distribution: 
> 
> select nt, count(*) from ( select ntile(11) over( order by 
> dbms_rowid.ROWID_BLOCK_NUMBER( rowid ) ) nt from Test_part) group by nt;
> 
> NT   COUNT(*)
> -- --
>  1 10
>  6 10
> 11  9
>  2 10
>  4 10
>  5 10
>  8 10
>  3 10
>  7 10
>  9  9
> 10  9
> 
> 11 rows selected.
> ^^ It looks good. Sure feel free to chose any other condition to order your 
> lines as best suits your case
> 
> So you can 
> 1) have one session reading and then decide where line goes (1 reader )
> 2) Or do multiple reads by specifying partition number. Note that in this 
> case you read whole table n times (in parallel) and is more internsive on 
> read part. (multiple readers)
> 
> Regards, 
> Igor
> 
> 
> 
> 2016-09-11 0:46 GMT+02:00 Mich Talebzadeh  >:
> Good points
> 
> Unfortunately databump. expr, imp use binary format for import and export. 
> that cannot be used to import data into HDFS in a suitable way.
> 
> One can use what is known as flat,sh script to get data out tab or , 
> separated etc.
> 
> ROWNUM is a pseudocolumn (not a real column) that is available in a query. 
> The issue is that in a table of 280Million rows to get the position of the 
> row it will have to do a table scan since no index cannot be built on it 
> (assuming there is no other suitable index). Not ideal but can be done.
> 
> I think a better alternative is to use datapump to take that table to 
> DEV/TEST, add a sequence (like an IDENTITY column in Sybase), build a unique 
> index on the sequence column and do the partitioning there.
> 
> HTH
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 10 September 2016 at 22:37, ayan guha  > wrote:
> In oracle something called row num is present in every row.  You can create 
> an evenly distribution using that column. If it is one time work, try using 
> sqoop. Are you using Oracle's own appliance? Then you can use data pump format
> 
> On 11 Sep 2016 01:59, "Mich Talebzadeh"  > wrote:
> creating an Oracle sequence for a table of 200million is not going to be that 
> easy without changing the schema. It is possible to export that table from 
> prod and import it to DEV/TEST and create the sequence there.
> 
> If it is a FACT table then the foreign keys from the Dimension tables will be 
> bitmap indexes on the FACT table so they can be potentially used.
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
> 
> Disclaimer: Use it at your own risk. Any and 

[jira] [Commented] (SPARK-17473) jdbc docker tests are failing with java.lang.AbstractMethodError:

2016-09-09 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477624#comment-15477624
 ] 

Suresh Thalamati commented on SPARK-17473:
--

E-mail exchange on dev :
http://apache-spark-developers-list.1001551.n3.nabble.com/Unable-to-run-docker-jdbc-integrations-test-td18870.html

[~joshrosen] [~luciano resende]


> jdbc docker tests are failing with java.lang.AbstractMethodError:
> -
>
> Key: SPARK-17473
> URL: https://issues.apache.org/jira/browse/SPARK-17473
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Suresh Thalamati
>
> build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0  -Phive-thriftserver 
> -Phive -DskipTests clean install
> build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 
>  compile test
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; 
> support was removed in 8.0
> Discovery starting.
> Discovery completed in 200 milliseconds.
> Run starting. Expected test count is: 10
> MySQLIntegrationSuite:
> Error:
> 16/09/06 11:52:00 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(driver, 9.31.117.25, 51868)
> *** RUN ABORTED ***
>   java.lang.AbstractMethodError:
>   at 
> org.glassfish.jersey.model.internal.CommonConfig.configureAutoDiscoverableProviders(CommonConfig.java:622)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.configureAutoDiscoverableProviders(ClientConfig.java:357)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:392)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>   at 
> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>   at 
> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>   at 
> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>   at 
> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>   ...
> 16/09/06 11:52:00 INFO SparkContext: Invoking stop() from shutdown hook
> 16/09/06 11:52:00 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Unable to run docker jdbc integrations test ?

2016-09-09 Thread Suresh Thalamati
I agree with Josh. These tests are valuable , even if  then  can not be run on 
Jenkins due to setup issues. It will be good to run them atleast manually, when 
jdbc data source specific changes are made . Filed Jira for this problem. 

https://issues.apache.org/jira/browse/SPARK-17473



> On Sep 7, 2016, at 4:58 PM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> That might be a reasonable and much more simpler approach to try... but if we 
> resolve these issues, we should make it part of some frequent build to make 
> sure the build don't regress and that the actual functionality don't regress 
> either. Let me look into this again...
> 
> On Wed, Sep 7, 2016 at 2:46 PM, Josh Rosen <joshro...@databricks.com 
> <mailto:joshro...@databricks.com>> wrote:
> I think that these tests are valuable so I'd like to keep them. If possible, 
> though, we should try to get rid of our dependency on the Spotify 
> docker-client library, since it's a dependency hell nightmare. Given our 
> relatively simple use of Docker here, I wonder whether we could just write 
> some simple scripting over the `docker` command-line tool instead of pulling 
> in such a problematic library.
> 
> On Wed, Sep 7, 2016 at 2:36 PM Luciano Resende <luckbr1...@gmail.com 
> <mailto:luckbr1...@gmail.com>> wrote:
> It looks like there is nobody running these tests, and after some dependency 
> upgrades in Spark 2.0 this has stopped working. I have tried to bring up this 
> but I am having some issues with getting the right dependencies loaded and 
> satisfying the docker-client expectations. 
> 
> The question then is: Does the community find value on having these tests 
> available ? Then we can focus on bringing them up and I can go push my 
> previous experiments as a WIP PR. Otherwise we should just get rid of these 
> tests.
> 
> Thoughts ?
> 
> 
> On Tue, Sep 6, 2016 at 4:05 PM, Suresh Thalamati <suresh.thalam...@gmail.com 
> <mailto:suresh.thalam...@gmail.com>> wrote:
> Hi, 
> 
> 
> I am getting the following error , when I am trying to run jdbc docker 
> integration tests on my laptop.   Any ideas , what I might be be doing wrong ?
> 
> build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0  -Phive-thriftserver 
> -Phive -DskipTests clean install
> build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 
>  compile test
> 
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; 
> support was removed in 8.0
> Discovery starting.
> Discovery completed in 200 milliseconds.
> Run starting. Expected test count is: 10
> MySQLIntegrationSuite:
> 
> Error:
> 16/09/06 11:52:00 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(driver, 9.31.117.25, 51868)
> *** RUN ABORTED ***
>   java.lang.AbstractMethodError:
>   at 
> org.glassfish.jersey.model.internal.CommonConfig.configureAutoDiscoverableProviders(CommonConfig.java:622)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.configureAutoDiscoverableProviders(ClientConfig.java:357)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:392)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>   at 
> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>   at 
> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>   at 
> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>   at 
> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>   ...
> 16/09/06 11:52:00 INFO SparkContext: Invoking stop() from shutdown hook
> 16/09/06 11:52:00 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 
> 
> 
> Thanks
> -suresh
> 
> 
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975 <http://twitter.com/lresende1975>
> http://lresende.blogspot.com/ <http://lresende.blogspot.com/>
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975 <http://twitter.com/lresende1975>
> http://lresende.blogspot.com/ <http://lresende.blogspot.com/>


[jira] [Created] (SPARK-17473) jdbc docker tests are failing with java.lang.AbstractMethodError:

2016-09-09 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-17473:


 Summary: jdbc docker tests are failing with 
java.lang.AbstractMethodError:
 Key: SPARK-17473
 URL: https://issues.apache.org/jira/browse/SPARK-17473
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.0.0, 2.1.0
Reporter: Suresh Thalamati


build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0  -Phive-thriftserver 
-Phive -DskipTests clean install
build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11  
compile test

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; 
support was removed in 8.0
Discovery starting.
Discovery completed in 200 milliseconds.
Run starting. Expected test count is: 10
MySQLIntegrationSuite:

Error:
16/09/06 11:52:00 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, 9.31.117.25, 51868)
*** RUN ABORTED ***
  java.lang.AbstractMethodError:
  at 
org.glassfish.jersey.model.internal.CommonConfig.configureAutoDiscoverableProviders(CommonConfig.java:622)
  at 
org.glassfish.jersey.client.ClientConfig$State.configureAutoDiscoverableProviders(ClientConfig.java:357)
  at 
org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:392)
  at 
org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
  at org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
  at org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
  at 
org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
  at org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
  at 
org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
  at 
org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
  ...
16/09/06 11:52:00 INFO SparkContext: Invoking stop() from shutdown hook
16/09/06 11:52:00 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Unable to run docker jdbc integrations test ?

2016-09-06 Thread Suresh Thalamati
Hi, 


I am getting the following error , when I am trying to run jdbc docker 
integration tests on my laptop.   Any ideas , what I might be be doing wrong ?

build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0  -Phive-thriftserver 
-Phive -DskipTests clean install
build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11  
compile test

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; 
support was removed in 8.0
Discovery starting.
Discovery completed in 200 milliseconds.
Run starting. Expected test count is: 10
MySQLIntegrationSuite:

Error:
16/09/06 11:52:00 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, 9.31.117.25, 51868)
*** RUN ABORTED ***
  java.lang.AbstractMethodError:
  at 
org.glassfish.jersey.model.internal.CommonConfig.configureAutoDiscoverableProviders(CommonConfig.java:622)
  at 
org.glassfish.jersey.client.ClientConfig$State.configureAutoDiscoverableProviders(ClientConfig.java:357)
  at 
org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:392)
  at 
org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
  at org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
  at org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
  at 
org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
  at org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
  at 
org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
  at 
org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
  ...
16/09/06 11:52:00 INFO SparkContext: Invoking stop() from shutdown hook
16/09/06 11:52:00 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!



Thanks
-suresh



[jira] [Commented] (SPARK-17385) Update Data in mySql using spark

2016-09-02 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459827#comment-15459827
 ] 

Suresh Thalamati commented on SPARK-17385:
--

Update is not supported from spark. Only option supported are append/overwrite. 

> Update Data in mySql using spark
> 
>
> Key: SPARK-17385
> URL: https://issues.apache.org/jira/browse/SPARK-17385
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Linux
>Reporter: Farman Ali
>  Labels: Apache, DataSource, Spark, Sql, mysql
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I am currently working on project that have mySql back-end DB .please tell me 
> how we update data in mysql using spark sql?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Welcoming Felix Cheung as a committer

2016-08-08 Thread Suresh Thalamati
Congratulations , Felix!



> On Aug 8, 2016, at 11:15 AM, Ted Yu  wrote:
> 
> Congratulations, Felix.
> 
> On Mon, Aug 8, 2016 at 11:15 AM, Matei Zaharia  > wrote:
> Hi all,
> 
> The PMC recently voted to add Felix Cheung as a committer. Felix has been a 
> major contributor to SparkR and we're excited to have him join officially. 
> Congrats and welcome, Felix!
> 
> Matei
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
> 
> 



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Suresh Thalamati
+1 (non-binding)

Tested data source api , and jdbc data sources. 


> On Jul 19, 2016, at 7:35 PM, Reynold Xin  wrote:
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes 
> if a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
> 
> 
> The tag to be voted on is v2.0.0-rc5 
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
> 
> This release candidate resolves ~2500 issues: 
> https://s.apache.org/spark-2.0.0-jira 
> 
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/ 
> 
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/ 
> 
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/ 
> 
> 
> 
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an 
> existing Spark workload and running on this release candidate, then reporting 
> any regressions from 1.x.
> 
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
> 
> Bugs already present in 1.x, missing features, or bugs related to new 
> features will not necessarily block this release. Note that historically 
> Spark documentation has been published on the website separately from the 
> main release so we do not need to block the release due to documentation 
> errors either.
> 



[jira] [Resolved] (SPARK-14218) dataset show() does not display column names in the correct order if underlying data frame schema order is different from the encoder schema order.

2016-07-19 Thread Suresh Thalamati (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Thalamati resolved SPARK-14218.
--
Resolution: Duplicate

verified on 2.0 , and trunk. This issue is resolved.

> dataset show() does not display column names in the correct order if 
> underlying data frame schema order is different from the encoder schema 
> order. 
> 
>
> Key: SPARK-14218
> URL: https://issues.apache.org/jira/browse/SPARK-14218
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Suresh Thalamati
>
> dataset show does not output column names correctly if the encoder schema 
> order is different from the the underlying data frame schema order. 
> Repro :
> {code}
> case class emp(id: Int, name:String)
> val df = sqlContext.sql("select 'Mike', 2").toDF("name", "id")
val ds = 
> df.as[emp]


> +++

> |name|  id|
> 
+++

> |   2|Mike|

> +++{code}
> Output Column names should  be  “id , name” to match with data correctly.
> This works correctly in spark 1.6 .  Output in spark 1.6:
> {code}
> scala> ds.show
> +---++
> | id|name|
> +---++
> |  2|Mike|
> +---++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Welcoming Yanbo Liang as a committer

2016-06-04 Thread Suresh Thalamati
Congratulations, Yanbo

> On Jun 3, 2016, at 7:48 PM, Matei Zaharia  wrote:
> 
> Hi all,
> 
> The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been a 
> super active contributor in many areas of MLlib. Please join me in welcoming 
> Yanbo!
> 
> Matei
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Commented] (SPARK-12504) JDBC data source credentials are not masked in the data frame explain output.

2016-05-25 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301392#comment-15301392
 ] 

Suresh Thalamati commented on SPARK-12504:
--

[~srowen]   I worked on resolving this issue. For some reason  Assignee is 
still Apache Spark.  If you  can change the Assignee , that will be great. 

> JDBC data source credentials are not masked in the data frame explain output.
> -
>
> Key: SPARK-12504
> URL: https://issues.apache.org/jira/browse/SPARK-12504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Suresh Thalamati
>Assignee: Apache Spark
> Fix For: 2.0.0
>
>
> Currently JDBC data source credentials are not masked in the explain output. 
> This can lead to accidental leakage of credentials into logs, and UI   
> SPARK -11206 added support for showing the SQL plan details in the History 
> server. After this change query plans are also written to the event logs in 
> the disk when event log is enabled, in this case credential will leak into 
> the event logs that can be accessed by file systems admins.
> Repro :
> {code}
> val empdf = sqlContext.read.jdbc("jdbc:postgresql://localhost:5432/mydb", 
> "spark_emp", psqlProps)
> empdf.explain(true)
> {code}
> Plan output with credentials :
> {code}
> == Parsed Logical Plan == +details
> == Parsed Logical Plan ==
> Limit 21
> +- Relation[id#4,name#5] 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata})
> == Analyzed Logical Plan ==
> id: int, name: string
> Limit 21
> +- Relation[id#4,name#5] 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata})
> == Optimized Logical Plan ==
> Limit 21
> +- Relation[id#4,name#5] 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata})
> == Physical Plan ==
> Limit 21
> +- Scan 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata}) PushedFilter: [] [id#4,name#5]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15538) Truncate table does not work on data source table , and does not raise error either.

2016-05-25 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-15538:


 Summary: Truncate table does not work on data source table , and 
does not raise error either.
 Key: SPARK-15538
 URL: https://issues.apache.org/jira/browse/SPARK-15538
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Suresh Thalamati
Priority: Minor


Truncate table does not  seems to work on data source table. It returns success 
without any error , but table is not truncated. 

Repro:
{code}
val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", 
"CA")).toDF("id", "name", "state")
df.write.format("parquet").partitionBy("state").saveAsTable("emp")

scala> sql("truncate table emp") 
res8: org.apache.spark.sql.DataFrame = []

scala> sql("select * from emp").show ;
+---+--+-+
| id|  name|state|
+---+--+-+
|  3|Robert|   CA|
|  1|  john|   CA|
|  2|  Mike|   NY|
+---+--+-+

{code} 

The select should have returned no results. 

By scanning through  the code  I found  some of the other DDL commands like 
LOAD DATA ,  and SHOW PARTITIONS are not allowed for data source table and they 
raise error. 

It  Might be good to throw error until the truncate table works with  data 
source table also.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15538) Truncate table does not work on data source table , and does not raise error either.

2016-05-25 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300867#comment-15300867
 ] 

Suresh Thalamati commented on SPARK-15538:
--

Working on PR for this issue. 

> Truncate table does not work on data source table , and does not raise error 
> either.
> 
>
> Key: SPARK-15538
> URL: https://issues.apache.org/jira/browse/SPARK-15538
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Suresh Thalamati
>Priority: Minor
>
> Truncate table does not  seems to work on data source table. It returns 
> success without any error , but table is not truncated. 
> Repro:
> {code}
> val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", 
> "CA")).toDF("id", "name", "state")
> df.write.format("parquet").partitionBy("state").saveAsTable("emp")
> scala> sql("truncate table emp") 
> res8: org.apache.spark.sql.DataFrame = []
> scala> sql("select * from emp").show ;
> +---+--+-+
> | id|  name|state|
> +---+--+-+
> |  3|Robert|   CA|
> |  1|  john|   CA|
> |  2|  Mike|   NY|
> +---+--+-+
> {code} 
> The select should have returned no results. 
> By scanning through  the code  I found  some of the other DDL commands like 
> LOAD DATA ,  and SHOW PARTITIONS are not allowed for data source table and 
> they raise error. 
> It  Might be good to throw error until the truncate table works with  data 
> source table also.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: JDBC SQL Server RDD

2016-05-17 Thread Suresh Thalamati
What is the error you are getting ?

At least on the  main code line I see JDBCRDD is marked as private[sql].  
Simple alternative  might be to call sql server using data frame api , and get 
rdd from data frame. 

eg:
val df = 
sqlContext.read.jdbc("jdbc:sqlserver://usaecducc1ew1.ccgaco45mak.us-east-1.rds.amazonaws.com
 
;database=ProdAWS;user=sa;password=?s3iY2mv6.H",
 "(select CTRY_NA,CTRY_SHRT_NA from dbo.CTRY)" , new java.util.Properties())

val rdd = df.rdd 


Hope that helps
-suresh

> On May 15, 2016, at 12:05 PM, KhajaAsmath Mohammed  
> wrote:
> 
> Hi ,
> 
> I am trying to test sql server connection with JDBC RDD but unable to connect.
> 
> val myRDD = new JdbcRDD( sparkContext, () => 
> DriverManager.getConnection(sqlServerConnectionString) ,
>   "select CTRY_NA,CTRY_SHRT_NA from dbo.CTRY limit ?, ?",
>   0, 5, 1, r => r.getString("CTRY_NA") + ", " + 
> r.getString("CTRY_SHRT_NA"))
> 
> 
> sqlServerConnectionString here is 
> jdbc:sqlserver://usaecducc1ew1.ccgaco45mak.us-east-1.rds.amazonaws.com 
> ;database=ProdAWS;user=sa;password=?s3iY2mv6.H
> 
> 
> can you please let me know what I am doing worng. I tried solutions from all 
> forums but didnt find any luck
> 
> Thanks,
> Asmath.



[jira] [Commented] (SPARK-15315) CSV datasource writes garbage for complex types instead of rasing error.

2016-05-13 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283042#comment-15283042
 ] 

Suresh Thalamati commented on SPARK-15315:
--

I am working on submitting PR for this issue , will submit soon. 

> CSV datasource writes garbage  for complex  types instead of rasing error. 
> ---
>
> Key: SPARK-15315
> URL: https://issues.apache.org/jira/browse/SPARK-15315
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Suresh Thalamati
>Priority: Minor
>
> Complex types support does not seem to exist yet for csv writer.  Error 
> should be raised instead of writing garbage into the files. 
> Read throw error java.lang.RuntimeException: Unsupported type: map 
> Repro:
> {code}
> Seq((1, Map("Tesla" -> 3))).toDF("id", "cars").write.csv("/tmp/output")
> cat part-r-7-f684166f-ac28-48a1-99dc-1bb2c2ef1164.csv
> 1,org.apache.spark.sql.catalyst.expressions.UnsafeMapData@6d10824e
> Seq((1, Array("Tesla"))).toDF("id", 
> "cars").write.mode("overwrite").csv("/tmp/output")
> cat part-r-7-5bc39b53-25b6-4802-a98d-ec0cbd0e189a.csv 
> 1,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@a7e180b5
> scala> Seq((1, "Tesla")).toDF("id", "name").select(struct("id", 
> "name")).write.mode("overwrite").csv("/tmp/output")
> cat part-r-7-e0c688e1-e04d-4fa0-baf9-31183871ef6d.csv 
> "[0,1,180005,616c736554]"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15315) CSV datasource writes garbage for complex types instead of rasing error.

2016-05-13 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-15315:


 Summary: CSV datasource writes garbage  for complex  types instead 
of rasing error. 
 Key: SPARK-15315
 URL: https://issues.apache.org/jira/browse/SPARK-15315
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Suresh Thalamati
Priority: Minor


Complex types support does not seem to exist yet for csv writer.  Error should 
be raised instead of writing garbage into the files. 

Read throw error java.lang.RuntimeException: Unsupported type: map 

Repro:
{code}
Seq((1, Map("Tesla" -> 3))).toDF("id", "cars").write.csv("/tmp/output")

cat part-r-7-f684166f-ac28-48a1-99dc-1bb2c2ef1164.csv
1,org.apache.spark.sql.catalyst.expressions.UnsafeMapData@6d10824e

Seq((1, Array("Tesla"))).toDF("id", 
"cars").write.mode("overwrite").csv("/tmp/output")

cat part-r-7-5bc39b53-25b6-4802-a98d-ec0cbd0e189a.csv 
1,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@a7e180b5


scala> Seq((1, "Tesla")).toDF("id", "name").select(struct("id", 
"name")).write.mode("overwrite").csv("/tmp/output")

cat part-r-7-e0c688e1-e04d-4fa0-baf9-31183871ef6d.csv 
"[0,1,180005,616c736554]"
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15112) Dataset filter returns garbage

2016-05-04 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271576#comment-15271576
 ] 

Suresh Thalamati commented on SPARK-15112:
--

I ran into similar issue SPARK-14218

> Dataset filter returns garbage
> --
>
> Key: SPARK-15112
> URL: https://issues.apache.org/jira/browse/SPARK-15112
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
> Attachments: demo 1 dataset - Databricks.htm
>
>
> See the following notebook:
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6122906529858466/2727501386611535/5382278320999420/latest.html
> I think it happens only when using JSON. I'm also going to attach it to the 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15125) CSV data source recognizes empty quoted strings in the input as null.

2016-05-04 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-15125:


 Summary: CSV data source recognizes empty quoted strings in the 
input as null. 
 Key: SPARK-15125
 URL: https://issues.apache.org/jira/browse/SPARK-15125
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Suresh Thalamati


CSV data source does not differentiate between empty quoted strings and empty 
fields  as null. In some scenarios user would want  to differentiate between 
these values,  especially in the context of SQL where NULL , and empty string 
have different meanings  If input data happens to be dump from traditional 
relational data source, users will see different results for the SQL queries. 

{code}
Repro:

Test Data: (test.csv)
year,make,model,comment,price
2017,Tesla,Mode 3,looks nice.,35000.99
2016,Chevy,Bolt,"",29000.00
2015,Porsche,"",,

scala> val df= sqlContext.read.format("csv").option("header", 
"true").option("inferSchema", "true").option("nullValue", 
null).load("/tmp/test.csv")
df: org.apache.spark.sql.DataFrame = [year: int, make: string ... 3 more fields]

scala> df.show
++---+--+---++
|year|   make| model|comment|   price|
++---+--+---++
|2017|  Tesla|Mode 3|looks nice.|35000.99|
|2016|  Chevy|  Bolt|   null| 29000.0|
|2015|Porsche|  null|   null|null|
++---+--+---++

Expected:
++---+--+---++
|year|   make| model|comment|   price|
++---+--+---++
|2017|  Tesla|Mode 3|looks nice.|35000.99|
|2016|  Chevy|  Bolt|   | 29000.0|
|2015|Porsche|  |   null|null|
++---+--+---++

{code}

Testing a fix for the this issue. I will give a shot at submitting a PR for 
this soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14840) Cannot drop a table which has the name starting with 'or'

2016-04-29 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264616#comment-15264616
 ] 

Suresh Thalamati commented on SPARK-14840:
--

On master(2.0) , this issue is fixed recently  as part of  SPARK-14762.

scala> sql("create table order(a int )") 
res15: org.apache.spark.sql.DataFrame = []
scala> sql("drop  table order") 
res17: org.apache.spark.sql.DataFrame = []

> Cannot drop a table which has the name starting with 'or'
> -
>
> Key: SPARK-14840
> URL: https://issues.apache.org/jira/browse/SPARK-14840
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Kwangwoo Kim
>
> sqlContext("drop table tmp.order")  
> The above code makes error as following: 
> 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order
> 16/04/22 14:27:19 INFO ParseDriver: Parse Completed
> 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected
> tmp.order
> ^
> java.lang.RuntimeException: [1.5] failure: identifier expected
> tmp.order
> ^
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58)
>   at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827)
>   at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
>   at 
> $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
>   at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37)
>   at $line15.$read$$iwC$$iwC$$iwC.(:39)
>   at $line15.$read$$iwC$$iwC.(:41)
>   at $line15.$read$$iwC.(:43)
>   at $line15.$read.(:45)
>   at $line15.$read$.(:49)
>   at $line15.$read$.()
>   at $line15.$eval$.(:7)
>   at $line15.$eval$.()
>   at $line15.$eval.$print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>   at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>   at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>   at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>   at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>   at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>   at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>   at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>   at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>  

[jira] [Commented] (SPARK-14343) Dataframe operations on a partitioned dataset (using partition discovery) return invalid results

2016-04-18 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246195#comment-15246195
 ] 

Suresh Thalamati commented on SPARK-14343:
--

This issue may be related to  SPARK-14463. 

> Dataframe operations on a partitioned dataset (using partition discovery) 
> return invalid results
> 
>
> Key: SPARK-14343
> URL: https://issues.apache.org/jira/browse/SPARK-14343
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0
> Environment: Mac OS X 10.11.4
>Reporter: Jurriaan Pruis
>
> When reading a dataset using {{sqlContext.read.text()}} queries on the 
> partitioned column return invalid results.
> h2. How to reproduce:
> h3. Generate datasets
> {code:title=repro.sh}
> #!/bin/sh
> mkdir -p dataset/year=2014
> mkdir -p dataset/year=2015
> echo "data from 2014" > dataset/year=2014/part01.txt
> echo "data from 2015" > dataset/year=2015/part01.txt
> {code}
> {code:title=repro2.sh}
> #!/bin/sh
> mkdir -p dataset2/month=june
> mkdir -p dataset2/month=july
> echo "data from june" > dataset2/month=june/part01.txt
> echo "data from july" > dataset2/month=july/part01.txt
> {code}
> h3. using first dataset
> {code:none}
> >>> df = sqlContext.read.text('dataset')
> ...
> >>> df
> DataFrame[value: string, year: int]
> >>> df.show()
> +--++
> | value|year|
> +--++
> |data from 2014|2014|
> |data from 2015|2015|
> +--++
> >>> df.select('year').show()
> ++
> |year|
> ++
> |  14|
> |  14|
> ++
> {code}
> This is clearly wrong. Seems like it returns the length of the value column?
> h3. using second dataset
> With another dataset it looks like this:
> {code:none}
> >>> df = sqlContext.read.text('dataset2')
> >>> df
> DataFrame[value: string, month: string]
> >>> df.show()
> +--+-+
> | value|month|
> +--+-+
> |data from june| june|
> |data from july| july|
> +--+-+
> >>> df.select('month').show()
> +--+
> | month|
> +--+
> |data from june|
> |data from july|
> +--+
> {code}
> Here it returns the value of the value column instead of the month partition.
> h3. Workaround
> When I convert the dataframe to an RDD and back to a DataFrame I get the 
> following result (which is the expected behaviour):
> {code:none}
> >>> df.rdd.toDF().select('month').show()
> +-+
> |month|
> +-+
> | june|
> | july|
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14586) SparkSQL doesn't parse decimal like Hive

2016-04-15 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243702#comment-15243702
 ] 

Suresh Thalamati commented on SPARK-14586:
--

Thanks for reporting this issue , Stephane.  which version of Hive are you are 
using ?  I took a quick  look at the code , here is what I found:

type decimal(4,2) will map to BigDecimal not double.  BigDecimal parsing  fails 
if there are spaces.   
{code}
scala> BigDecimal(" 2.0")
java.lang.NumberFormatException
  at java.math.BigDecimal.(BigDecimal.java:494)
  at java.math.BigDecimal.(BigDecimal.java:383)
{code}

Spark SQL also relies on  HiveDecimal to convert  the string to BigDecimal 
value. 
Hive made fix  in 2.0 release to trim decimal input string. 
https://issues.apache.org/jira/browse/HIVE-12343
https://issues.apache.org/jira/browse/HIVE-10799
commit : 
https://github.com/apache/hive/commit/c178a6e9d12055e5bde634123ca58f243ae39477
{code}
 common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 
   public static HiveDecimal create(String dec) {
 BigDecimal bd;
 try {
-  bd = new BigDecimal(dec);
+  bd = new BigDecimal(dec.trim());
 } catch (NumberFormatException ex) {
   return null;
 }
{code}

When Spark moves to 2.0 version of Hive, decimal parsing should behave same as 
Hive.  I am not sure about  the plans to upgrade Hive version inside Spark.  
Copying Yin Hui. 

[~yhuai]



> SparkSQL doesn't parse decimal like Hive
> 
>
> Key: SPARK-14586
> URL: https://issues.apache.org/jira/browse/SPARK-14586
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Stephane Maarek
>
> create a test_data.csv with the following
> {code:none}
> a, 2.0
> ,3.0
> {code}
> (the space is intended before the 2)
> copy the test_data.csv to hdfs:///spark_testing_2
> go in hive, run the following statements
> {code:sql}
> CREATE SCHEMA IF NOT EXISTS spark_testing;
> DROP TABLE IF EXISTS spark_testing.test_csv_2;
> CREATE EXTERNAL TABLE `spark_testing.test_csv_2`(
>   column_1 varchar(10),
>   column_2 decimal(4,2))
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE LOCATION '/spark_testing_2'
> TBLPROPERTIES('serialization.null.format'='');
> select * from spark_testing.test_csv_2;
> OK
> a   2
> NULL3
> {code}
> As you can see, the value " 2" gets parsed correctly to 2
> Now onto Spark-shell:
> {code:java}
> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
> sqlContext.sql("select * from spark_testing.test_csv_2").show()
> +++
> |column_1|column_2|
> +++
> |   a|null|
> |null|3.00|
> +++
> {code}
> As you can see, the " 2" got parsed to null. Therefore Hive and Spark don't 
> have a similar parsing behavior for decimals. I wouldn't say it is a bug per 
> se, but it looks like a necessary improvement for the two engines to 
> converge. Hive version is 1.5.1
> Not sure if relevant, but Scala does parse numbers with leading space 
> correctly
> {code}
> scala> "2.0".toDouble
> res21: Double = 2.0
> scala> " 2.0".toDouble
> res22: Double = 2.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14218) dataset show() does not display column names in the correct order if underlying data frame schema order is different from the encoder schema order.

2016-03-28 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215098#comment-15215098
 ] 

Suresh Thalamati commented on SPARK-14218:
--

I am giving a shot at  submitting PR  for this issue. 

> dataset show() does not display column names in the correct order if 
> underlying data frame schema order is different from the encoder schema 
> order. 
> 
>
> Key: SPARK-14218
> URL: https://issues.apache.org/jira/browse/SPARK-14218
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Suresh Thalamati
>
> dataset show does not output column names correctly if the encoder schema 
> order is different from the the underlying data frame schema order. 
> Repro :
> {code}
> case class emp(id: Int, name:String)
> val df = sqlContext.sql("select 'Mike', 2").toDF("name", "id")
val ds = 
> df.as[emp]


> +++

> |name|  id|
> 
+++

> |   2|Mike|

> +++{code}
> Output Column names should  be  “id , name” to match with data correctly.
> This works correctly in spark 1.6 .  Output in spark 1.6:
> {code}
> scala> ds.show
> +---++
> | id|name|
> +---++
> |  2|Mike|
> +---++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14218) dataset show() does not display column names in the correct order if underlying data frame schema order is different from the encoder schema order.

2016-03-28 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-14218:


 Summary: dataset show() does not display column names in the 
correct order if underlying data frame schema order is different from the 
encoder schema order. 
 Key: SPARK-14218
 URL: https://issues.apache.org/jira/browse/SPARK-14218
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Suresh Thalamati


dataset show does not output column names correctly if the encoder schema order 
is different from the the underlying data frame schema order. 

Repro :
{code}
case class emp(id: Int, name:String)
val df = sqlContext.sql("select 'Mike', 2").toDF("name", "id")
val ds = 
df.as[emp]


+++

|name|  id|

+++

|   2|Mike|

+++{code}

Output Column names should  be  “id , name” to match with data correctly.

This works correctly in spark 1.6 .  Output in spark 1.6:
{code}

scala> ds.show
+---++
| id|name|
+---++
|  2|Mike|
+---++
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13860) TPCDS query 39 returns wrong results compared to TPC official result set

2016-03-19 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200330#comment-15200330
 ] 

Suresh Thalamati commented on SPARK-13860:
--

I looked into this issue and found out NULL values in the input columns to 
stddev_samp function are  the cause of NaN in the output , which contributes to 
the difference in the results.   I verified using couple of ways:

1) Verified by modifying the inventory tables data  to update the null value 
for inv_quantity_on_hand , and
ran the query to make sure no NaN's are returned.

2) I tried simplified repro on Spark SQL , and Hive. Following is the output:

--Spark SQL 
{code}
drop table if exists foo;
create table foo (c1 int  , c2 double);
insert into foo select 1, NULL;
insert into foo select 1, 1;
insert into foo select 2, NULL;
insert into foo select 3,3;
insert into foo select 3,4;
insert into foo select 4,4;

select * from foo
1   NULL
1   1.0
2   NULL
3   3.0
3   4.0
4   4.0
Time taken: 0.168 seconds, Fetched 6 row(s)

select c1 , stddev_sampp(c2) from foo group by c1;

1   NaN
3   0.7071067811865476
4   NaN
2   NULL
Time taken: 0.983 seconds, Fetched 4 row(s)

{code}
--Hive  Results 
{code}
drop table if exists foo;
create table foo (c1 int  , c2 double);
insert into foo values (1, NULL);
insert into foo values(1, 1);
insert into foo values(2, NULL);
insert into foo values(3,3);
insert into foo values(3,4);
insert into foo values(4,4);
select * from foo;

hive> select * from foo;
OK
1   NULL
1   1.0
2   NULL
3   3.0
3   4.0
4   4.0
Time taken: 0.072 seconds, Fetched: 6 row(s)

select c1 , stddev_samp(c2) from foo group by c1;

OK
1   0.0
2   NULL
3   0.7071067811865476
4   0.0
Time taken: 16.696 seconds, Fetched: 4 row(s)

{code}

It looks like Hive is returning 0.0 if there are NULL values in the set , and 
Spark SQL is returning NaN.

[~yhuai] [~mengxr]

Is the NaN output  spark expected  behavior  ?  I would appreciate your input.
 

> TPCDS query 39 returns wrong results compared to TPC official result set 
> -
>
> Key: SPARK-13860
> URL: https://issues.apache.org/jira/browse/SPARK-13860
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: JESSE CHEN
>  Labels: tpcds-result-mismatch
>
> Testing Spark SQL using TPC queries. Query 39 returns wrong results compared 
> to official result set. This is at 1GB SF (validation run).
> q39a - 3 extra rows in SparkSQL output (eg. 
> [1,1155,1,184.0,NaN,1,1155,2,343.3,1.1700233592269733])   ;  q39b 
> - 3 extra rows in SparkSQL output (eg. 
> [1,1155,1,184.0,NaN,1,1155,2,343.3,1.1700233592269733])
> Actual results 39a:
> {noformat}
> [1,265,1,324.75,1.2438391781531353,1,265,2,329.0,1.0151581328149208]
> [1,363,1,499.5,1.031941572270649,1,363,2,321.0,1.1411766752007977]
> [1,679,1,373.75,1.0955498064867504,1,679,2,417.5,1.042970994259454]
> [1,695,1,450.75,1.0835888283564505,1,695,2,368.75,1.1356494125569416]
> [1,789,1,357.25,1.03450938027956,1,789,2,410.0,1.0284221852702604]
> [1,815,1,216.5,1.1702270938111008,1,815,2,150.5,1.3057281471249382]
> [1,827,1,271.75,1.1046890134130438,1,827,2,424.75,1.1653198631238286]
> [1,1041,1,382.5,1.284808399803008,1,1041,2,424.75,1.000577271456812]
> [1,1155,1,184.0,NaN,1,1155,2,343.3,1.1700233592269733]
> [1,1569,1,212.0,1.630213519639535,1,1569,2,239.25,1.2641513267800557]
> [1,1623,1,338.25,1.1285483279713715,1,1623,2,261.3,1.2717809002195564]
> [1,2581,1,448.5,1.060429041250449,1,2581,2,476.25,1.0362984739390064]
> [1,2705,1,246.25,1.0120308357959693,1,2705,2,294.7,1.0742134101583702]
> [1,3131,1,393.75,1.0037613982687346,1,3131,2,480.5,1.0669144981482768]
> [1,3291,1,374.5,1.195189833087008,1,3291,2,265.25,1.572972106948466]
> [1,3687,1,279.75,1.4260909081999698,1,3687,2,157.25,1.4534340882531784]
> [1,4955,1,495.25,1.0318296151625301,1,4955,2,322.5,1.1693842343776149]
> [1,5627,1,282.75,1.5657032366359889,1,5627,2,297.5,1.2084286841430678]
> [1,7017,1,175.5,1.0427454215644427,1,7017,2,321.3,1.0183356932936254]
> [1,7317,1,366.3,1.025466403613547,1,7317,2,378.0,1.2172513189920555]
> [1,7569,1,430.5,1.0874396852180854,1,7569,2,360.25,1.047005559314515]
> [1,7999,1,166.25,1.7924231710846223,1,7999,2,375.3,1.008092263550718]
> [1,8319,1,306.75,1.1615378040478215,1,8319,2,276.0,1.1420996385609428]
> [1,8443,1,327.75,1.256718374192724,1,8443,2,332.5,1.0044167259988928]
> [1,8583,1,319.5,1.024108893111539,1,8583,2,310.25,1.2358813775861328]
> [

[jira] [Commented] (SPARK-13860) TPCDS query 39 returns wrong results compared to TPC official result set

2016-03-19 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200805#comment-15200805
 ] 

Suresh Thalamati commented on SPARK-13860:
--

[~yhuai] [~mengxr]

I noticed there is discussion on the following PR to return NaN for empty set 
case for stddev. 
https://github.com/apache/spark/pull/9705

Should we address the  stddev inconsistency with Hive  or   stay with  the 
current behavior ?

> TPCDS query 39 returns wrong results compared to TPC official result set 
> -
>
> Key: SPARK-13860
> URL: https://issues.apache.org/jira/browse/SPARK-13860
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: JESSE CHEN
>  Labels: tpcds-result-mismatch
>
> Testing Spark SQL using TPC queries. Query 39 returns wrong results compared 
> to official result set. This is at 1GB SF (validation run).
> q39a - 3 extra rows in SparkSQL output (eg. 
> [1,1155,1,184.0,NaN,1,1155,2,343.3,1.1700233592269733])   ;  q39b 
> - 3 extra rows in SparkSQL output (eg. 
> [1,1155,1,184.0,NaN,1,1155,2,343.3,1.1700233592269733])
> Actual results 39a:
> {noformat}
> [1,265,1,324.75,1.2438391781531353,1,265,2,329.0,1.0151581328149208]
> [1,363,1,499.5,1.031941572270649,1,363,2,321.0,1.1411766752007977]
> [1,679,1,373.75,1.0955498064867504,1,679,2,417.5,1.042970994259454]
> [1,695,1,450.75,1.0835888283564505,1,695,2,368.75,1.1356494125569416]
> [1,789,1,357.25,1.03450938027956,1,789,2,410.0,1.0284221852702604]
> [1,815,1,216.5,1.1702270938111008,1,815,2,150.5,1.3057281471249382]
> [1,827,1,271.75,1.1046890134130438,1,827,2,424.75,1.1653198631238286]
> [1,1041,1,382.5,1.284808399803008,1,1041,2,424.75,1.000577271456812]
> [1,1155,1,184.0,NaN,1,1155,2,343.3,1.1700233592269733]
> [1,1569,1,212.0,1.630213519639535,1,1569,2,239.25,1.2641513267800557]
> [1,1623,1,338.25,1.1285483279713715,1,1623,2,261.3,1.2717809002195564]
> [1,2581,1,448.5,1.060429041250449,1,2581,2,476.25,1.0362984739390064]
> [1,2705,1,246.25,1.0120308357959693,1,2705,2,294.7,1.0742134101583702]
> [1,3131,1,393.75,1.0037613982687346,1,3131,2,480.5,1.0669144981482768]
> [1,3291,1,374.5,1.195189833087008,1,3291,2,265.25,1.572972106948466]
> [1,3687,1,279.75,1.4260909081999698,1,3687,2,157.25,1.4534340882531784]
> [1,4955,1,495.25,1.0318296151625301,1,4955,2,322.5,1.1693842343776149]
> [1,5627,1,282.75,1.5657032366359889,1,5627,2,297.5,1.2084286841430678]
> [1,7017,1,175.5,1.0427454215644427,1,7017,2,321.3,1.0183356932936254]
> [1,7317,1,366.3,1.025466403613547,1,7317,2,378.0,1.2172513189920555]
> [1,7569,1,430.5,1.0874396852180854,1,7569,2,360.25,1.047005559314515]
> [1,7999,1,166.25,1.7924231710846223,1,7999,2,375.3,1.008092263550718]
> [1,8319,1,306.75,1.1615378040478215,1,8319,2,276.0,1.1420996385609428]
> [1,8443,1,327.75,1.256718374192724,1,8443,2,332.5,1.0044167259988928]
> [1,8583,1,319.5,1.024108893111539,1,8583,2,310.25,1.2358813775861328]
> [1,8591,1,398.0,1.1478168692042447,1,8591,2,355.75,1.0024472149348966]
> [1,8611,1,300.5,1.5191545184147954,1,8611,2,243.75,1.2342122780960432]
> [1,9081,1,367.0,1.0878932141280895,1,9081,2,435.0,1.0330530776324107]
> [1,9357,1,351.7,1.1902922622025887,1,9357,2,427.0,1.0438583026358363]
> [1,9449,1,406.25,1.0183183104803557,1,9449,2,175.0,1.0544779796296408]
> [1,9713,1,242.5,1.1035044355064203,1,9713,2,393.0,1.208474608738988]
> [1,9809,1,479.0,1.0189602512117633,1,9809,2,317.5,1.0614142074924882]
> [1,9993,1,417.75,1.0099832672435247,1,9993,2,204.5,1.552870745350107]
> [1,10127,1,239.75,1.0561770587198123,1,10127,2,359.25,1.1857980403742183]
> [1,11159,1,407.25,1.0785507154337637,1,11159,2,250.0,1.334757905639321]
> [1,11277,1,211.25,1.2615858275316627,1,11277,2,330.75,1.0808767951625093]
> [1,11937,1,344.5,1.085804026843784,1,11937,2,200.34,1.0638527063883725]
> [1,12373,1,387.75,1.1014904822941258,1,12373,2,306.0,1.0761744390394028]
> [1,12471,1,365.25,1.0607570183728479,1,12471,2,327.25,1.0547560580567852]
> [1,12625,1,279.0,1.3016560542373208,1,12625,2,443.25,1.0604958838068959]
> [1,12751,1,280.75,1.10833057888089,1,12751,2,369.3,1.3416504398884601]
> [1,12779,1,331.0,1.041690207320035,1,12779,2,359.0,1.028978056175258]
> [1,13077,1,367.7,1.345523904195734,1,13077,2,358.7,1.5132429058096555]
> [1,13191,1,260.25,1.063569632291568,1,13191,2,405.0,1.0197999172180061]
> [1,13561,1,335.25,1.2609616961776389,1,13561,2,240.0,1.0513604502245155]
> [1,13935,1,311.75,1.

[jira] [Commented] (SPARK-13820) TPC-DS Query 10 fails to compile

2016-03-15 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196419#comment-15196419
 ] 

Suresh Thalamati commented on SPARK-13820:
--

This query contains correlated subquery, it is not supported yet in spark sql.  

[~davies] I saw your PR https://github.com/apache/spark/pull/10706 on these 
kind of query,  are you planning to merge this for 2.0 ?

> TPC-DS Query 10 fails to compile
> 
>
> Key: SPARK-13820
> URL: https://issues.apache.org/jira/browse/SPARK-13820
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 10 fails to compile with the following error.
> Parsing error: KW_SELECT )=> ( KW_EXISTS subQueryExpression ) -> ^( 
> TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );])
> at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
> at org.antlr.runtime.DFA.predict(DFA.java:144)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8155)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9177)
> Parsing error: KW_SELECT )=> ( KW_EXISTS subQueryExpression ) -> ^( 
> TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );])
> at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
> at org.antlr.runtime.DFA.predict(DFA.java:144)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8155)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9177)
> Query is pasted here for easy reproduction
>  select
>   cd_gender,
>   cd_marital_status,
>   cd_education_status,
>   count(*) cnt1,
>   cd_purchase_estimate,
>   count(*) cnt2,
>   cd_credit_rating,
>   count(*) cnt3,
>   cd_dep_count,
>   count(*) cnt4,
>   cd_dep_employed_count,
>   count(*) cnt5,
>   cd_dep_college_count,
>   count(*) cnt6
>  from
>   customer c
>   JOIN customer_address ca ON c.c_current_addr_sk = ca.ca_address_sk
>   JOIN customer_demographics ON cd_demo_sk = c.c_current_cdemo_sk
>   LEFT SEMI JOIN (select ss_customer_sk
>   from store_sales
>JOIN date_dim ON ss_sold_date_sk = d_date_sk
>   where
> d_year = 2002 and
> d_moy between 1 and 1+3) ss_wh1 ON c.c_customer_sk = 
> ss_wh1.ss_customer_sk
>  where
>   ca_county in ('Rush County','Toole County','Jefferson County','Dona Ana 
> County','La Porte County') and
>exists (
> select tmp.customer_sk from (
> select ws_bill_customer_sk as customer_sk
> from web_sales,date_dim
> where
>   web_sales.ws_sold_date_sk = date_dim.d_date_sk and
>   d_year = 2002 and
>   d_moy between 1 and 1+3
> UNION ALL
> select cs_ship_customer_sk as customer_sk
> from catalog_sales,date_dim
> where
>   catalog_sales.cs_sold_date_sk = date_dim.d_date_sk and
>   d_year = 2002 and
>   d_moy between 1 and 1+3
>   ) tmp where c.c_customer_sk = tmp.customer_sk
> )
>  group by cd_gender,
>   cd_marital_status,
>   cd_education_status,
>   cd_purchase_estimate,
>   cd_credit_rating,
>   cd_dep_count,
>   cd_dep_employed_count,
>   cd_dep_college_count
>  order by cd_gender,
>   cd_marital_status,
>   cd_education_status,
>   cd_purchase_estimate,
>   cd_credit_rating,
>   cd_dep_count,
>   cd_dep_employed_count,
>   cd_dep_college_count
>   limit 100;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Microsoft SQL dialect issues

2016-03-15 Thread Suresh Thalamati
You should be able to register your own dialect if the default mappings are  
not working for your scenario.

import org.apache.spark.sql.jdbc
JdbcDialects.registerDialect(MyDialect)

Please refer to the  JdbcDialects to find example of  existing default dialect 
for your database or another database.
https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
 

https://github.com/apache/spark/tree/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/jdbc
 


 


> On Mar 15, 2016, at 12:41 PM, Mich Talebzadeh  
> wrote:
> 
> Hi,
> 
> Can you please clarify what you are trying to achieve and I guess you mean 
> Transact_SQL for MSSQL?
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 15 March 2016 at 19:09, Andrés Ivaldi  > wrote:
> Hello, I'm trying to use MSSQL, storing data on MSSQL but i'm having dialect 
> problems
> I found this
> https://mail-archives.apache.org/mod_mbox/spark-issues/201510.mbox/%3cjira.12901078.1443461051000.34556.1444123886...@atlassian.jira%3E
>  
> 
> 
> That is what is happening to me, It's possible to define the dialect? so I 
> can override the default for SQLServer?
> 
> Regards. 
> 
> -- 
> Ing. Ivaldi Andres
> 



[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table

2016-03-07 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183996#comment-15183996
 ] 

Suresh Thalamati commented on SPARK-13699:
--

Thank you for providing  the reproduction to the problem I was able to 
reproduce the issue.  Problem is you are trying to overwrite a table that is 
also being read in the data frame. This is not allowed , it should fail with an 
error  (I noticed in some cases I get an error 
org.apache.spark.sql.AnalysisException: Cannot overwrite table `t1` that is 
also being read from).I think  this usage should  raise an error. 

Truncate is any interesting option ,  especially with jdbc data source.  But 
that will not address the problem you are running into, it will run into same 
problem as Overwrite.
 

{code}

scala> tgtFinal.explain
== Physical Plan ==
Union
:- WholeStageCodegen
:  :  +- Project 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,cast(enddate#230
 as string) AS enddate#263,updatedate#231]
:  : +- Filter (currind#228 = N)
:  :+- INPUT
:  +- HiveTableScan 
[enddate#230,updatedate#231,col2#224,col1#223,batchid#227,col3#225,startdate#229,currind#228,col4#226],
 MetastoreRelation default, tgt_table, None
:- WholeStageCodegen
:  :  +- Project 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,cast(enddate#230
 as string) AS enddate#264,updatedate#231]
:  : +- INPUT
:  +- Except
: :- WholeStageCodegen
: :  :  +- Filter (currind#228 = Y)
: :  : +- INPUT
: :  +- HiveTableScan 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231],
 MetastoreRelation default, tgt_table, None
: +- WholeStageCodegen
::  +- Project 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231]
:: +- BroadcastHashJoin [cast(col1#223 as double)], [cast(col1#219 
as double)], Inner, BuildRight, None
:::- Filter (currind#228 = Y)
:::  +- INPUT
::+- INPUT
::- HiveTableScan 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231],
 MetastoreRelation default, tgt_table, None
:+- HiveTableScan [col1#219], MetastoreRelation default, src_table, None
:- WholeStageCodegen
:  :  +- Project [col1#223,col2#224,col3#225,col4#226,batchid#227,UDF(col1#223) 
AS currInd#232,startdate#229,2016-03-07 15:12:20.584 AS 
endDate#265,1457392340584000 AS updateDate#234]
:  : +- BroadcastHashJoin [cast(col1#223 as double)], [cast(col1#219 as 
double)], Inner, BuildRight, None
:  ::- Project 
[col3#225,startdate#229,col2#224,col1#223,batchid#227,col4#226]
:  ::  +- Filter (currind#228 = Y)
:  :: +- INPUT
:  :+- INPUT
:  :- HiveTableScan 
[col3#225,startdate#229,col2#224,col1#223,batchid#227,col4#226,currind#228], 
MetastoreRelation default, tgt_table, None
:  +- HiveTableScan [col1#219], MetastoreRelation default, src_table, None
+- WholeStageCodegen
   :  +- Project [cast(col1#219 as string) AS 
col1#266,col2#220,col3#221,col4#222,UDF(cast(col1#219 as string)) AS 
batchId#235,UDF(cast(col1#219 as string)) AS currInd#236,1457392340584000 AS 
startDate#237,date_format(cast(UDF(cast(col1#219 as string)) as 
timestamp),-MM-dd HH:mm:ss) AS endDate#238,1457392340584000 AS 
updateDate#239]
   : +- INPUT
   +- HiveTableScan [col1#219,col2#220,col3#221,col4#222], MetastoreRelation 
default, src_table, None

scala> 
{code}

> Spark SQL drops the table in "overwrite" mode while writing into table
> --
>
> Key: SPARK-13699
> URL: https://issues.apache.org/jira/browse/SPARK-13699
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Dhaval Modi
> Attachments: stackTrace.txt
>
>
> Hi,
> While writing the dataframe to HIVE table with "SaveMode.Overwrite" option.
> E.g.
> tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table")
> sqlContext drop the table instead of truncating.
> This is causing error while overwriting.
> Adding stacktrace & commands to reproduce the issue,
> Thanks & Regards,
> Dhaval



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Error reading a CSV

2016-02-24 Thread Suresh Thalamati
Try creating  /user/hive/warehouse/  directory if it does not exists , and 
check it has
 write permission for the user. Note the lower case ‘user’  in the path.  

> On Feb 24, 2016, at 2:42 PM, skunkwerk  wrote:
> 
> I have downloaded the Spark binary with Hadoop 2.6.
> When I run the spark-sql program like this with the CSV library:
> ./bin/spark-sql --packages com.databricks:spark-csv_2.11:1.3.0
> 
> I get into the console for spark-sql.
> However, when I try to import a CSV file from my local filesystem:
> 
> CREATE TABLE customerview USING com.databricks.spark.csv OPTIONS (path
> "/Users/imran/Downloads/test.csv", header "true", inferSchema "true");
> 
> I get the following error:
> org.apache.hadoop.hive.ql.metadata.HiveException:
> MetaException(message:file:/user/hive/warehouse/test is not a directory or
> unable to create one)
> 
> http://pastebin.com/BfyVv14U
> 
> How can I fix this?
> 
> thanks
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-a-CSV-tp26329.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Welcoming two new committers

2016-02-08 Thread Suresh Thalamati
Congratulations Herman and Wenchen!

On Mon, Feb 8, 2016 at 10:59 AM, Andrew Or  wrote:

> Welcome!
>
> 2016-02-08 10:55 GMT-08:00 Bhupendra Mishra :
>
>> Congratulations to both. and welcome to group.
>>
>> On Mon, Feb 8, 2016 at 10:45 PM, Matei Zaharia 
>> wrote:
>>
>>> Hi all,
>>>
>>> The PMC has recently added two new Spark committers -- Herman van Hovell
>>> and Wenchen Fan. Both have been heavily involved in Spark SQL and Tungsten,
>>> adding new features, optimizations and APIs. Please join me in welcoming
>>> Herman and Wenchen.
>>>
>>> Matei
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>
>


[jira] [Created] (SPARK-13167) JDBC data source does not include null value partition columns rows in the result.

2016-02-03 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-13167:


 Summary: JDBC data source does not include null value partition 
columns rows in the result.
 Key: SPARK-13167
 URL: https://issues.apache.org/jira/browse/SPARK-13167
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0, 2.0.0
Reporter: Suresh Thalamati


Reading from am JDBC data source using a partition column that is nullable can 
return incorrect number of rows, if there are rows with null value for 
partition column.

{code}
val emp = 
sqlContext.read.jdbc("jdbc:h2:mem:testdb0;user=testUser;password=testPass", 
"TEST.EMP", "theid", 0, 4, 3, new Properties)
emp.count()
{code}

Above jdbc read call sets up the partitions of the following form. It does not 
include null predicate.

{code}
JDBCPartition(THEID < 1,0),JDBCPartition(THEID >= 1 AND THEID < 
2,1),JDBCPartition(THEID >= 2,2)
{code}

Rows with null values in partition column are not included in the results 
because the partition predicate does not specify is null predicates.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13167) JDBC data source does not include null value partition columns rows in the result.

2016-02-03 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131258#comment-15131258
 ] 

Suresh Thalamati commented on SPARK-13167:
--

I am working on fix for this issue. 

> JDBC data source does not include null value partition columns rows in the 
> result.
> --
>
> Key: SPARK-13167
> URL: https://issues.apache.org/jira/browse/SPARK-13167
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0
>    Reporter: Suresh Thalamati
>
> Reading from am JDBC data source using a partition column that is nullable 
> can return incorrect number of rows, if there are rows with null value for 
> partition column.
> {code}
> val emp = 
> sqlContext.read.jdbc("jdbc:h2:mem:testdb0;user=testUser;password=testPass", 
> "TEST.EMP", "theid", 0, 4, 3, new Properties)
> emp.count()
> {code}
> Above jdbc read call sets up the partitions of the following form. It does 
> not include null predicate.
> {code}
> JDBCPartition(THEID < 1,0),JDBCPartition(THEID >= 1 AND THEID < 
> 2,1),JDBCPartition(THEID >= 2,2)
> {code}
> Rows with null values in partition column are not included in the results 
> because the partition predicate does not specify is null predicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[no subject]

2016-01-08 Thread Suresh Thalamati



subscribe

2016-01-04 Thread Suresh Thalamati



[jira] [Created] (SPARK-12504) JDBC data source credentials are not masked in the data frame explain output.

2015-12-23 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-12504:


 Summary: JDBC data source credentials are not masked in the data 
frame explain output.
 Key: SPARK-12504
 URL: https://issues.apache.org/jira/browse/SPARK-12504
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Suresh Thalamati


Currently JDBC data source credentials are not masked in the explain output. 
This can lead to accidental leakage of credentials into logs, and UI   

SPARK -11206 added support for showing the SQL plan details in the History 
server. After this change query plans are also written to the event logs in the 
disk when event log is enabled, in this case credential will leak into the 
event logs that can be accessed by file systems admins.

Repro :
val empdf = sqlContext.read.jdbc("jdbc:postgresql://localhost:5432/mydb", 
"spark_emp", psqlProps)
empdf.explain(true)

Plan output with credentials :
== Parsed Logical Plan == +details

== Parsed Logical Plan ==
Limit 21
+- Relation[id#4,name#5] 
JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
 password=pwdata})

== Analyzed Logical Plan ==
id: int, name: string
Limit 21
+- Relation[id#4,name#5] 
JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
 password=pwdata})

== Optimized Logical Plan ==
Limit 21
+- Relation[id#4,name#5] 
JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
 password=pwdata})

== Physical Plan ==
Limit 21
+- Scan 
JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
 password=pwdata}) PushedFilter: [] [id#4,name#5]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12504) JDBC data source credentials are not masked in the data frame explain output.

2015-12-23 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070002#comment-15070002
 ] 

Suresh Thalamati commented on SPARK-12504:
--

I am testing the fix for this issue , will post the PR soon. 

> JDBC data source credentials are not masked in the data frame explain output.
> -
>
> Key: SPARK-12504
> URL: https://issues.apache.org/jira/browse/SPARK-12504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Suresh Thalamati
>
> Currently JDBC data source credentials are not masked in the explain output. 
> This can lead to accidental leakage of credentials into logs, and UI   
> SPARK -11206 added support for showing the SQL plan details in the History 
> server. After this change query plans are also written to the event logs in 
> the disk when event log is enabled, in this case credential will leak into 
> the event logs that can be accessed by file systems admins.
> Repro :
> val empdf = sqlContext.read.jdbc("jdbc:postgresql://localhost:5432/mydb", 
> "spark_emp", psqlProps)
> empdf.explain(true)
> Plan output with credentials :
> == Parsed Logical Plan == +details
> == Parsed Logical Plan ==
> Limit 21
> +- Relation[id#4,name#5] 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata})
> == Analyzed Logical Plan ==
> id: int, name: string
> Limit 21
> +- Relation[id#4,name#5] 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata})
> == Optimized Logical Plan ==
> Limit 21
> +- Relation[id#4,name#5] 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata})
> == Physical Plan ==
> Limit 21
> +- Scan 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata}) PushedFilter: [] [id#4,name#5]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12504) JDBC data source credentials are not masked in the data frame explain output.

2015-12-23 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070054#comment-15070054
 ] 

Suresh Thalamati commented on SPARK-12504:
--

Pull Request:
https://github.com/apache/spark/pull/10452

> JDBC data source credentials are not masked in the data frame explain output.
> -
>
> Key: SPARK-12504
> URL: https://issues.apache.org/jira/browse/SPARK-12504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Suresh Thalamati
>
> Currently JDBC data source credentials are not masked in the explain output. 
> This can lead to accidental leakage of credentials into logs, and UI   
> SPARK -11206 added support for showing the SQL plan details in the History 
> server. After this change query plans are also written to the event logs in 
> the disk when event log is enabled, in this case credential will leak into 
> the event logs that can be accessed by file systems admins.
> Repro :
> val empdf = sqlContext.read.jdbc("jdbc:postgresql://localhost:5432/mydb", 
> "spark_emp", psqlProps)
> empdf.explain(true)
> Plan output with credentials :
> == Parsed Logical Plan == +details
> == Parsed Logical Plan ==
> Limit 21
> +- Relation[id#4,name#5] 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata})
> == Analyzed Logical Plan ==
> id: int, name: string
> Limit 21
> +- Relation[id#4,name#5] 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata})
> == Optimized Logical Plan ==
> Limit 21
> +- Relation[id#4,name#5] 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata})
> == Physical Plan ==
> Limit 21
> +- Scan 
> JDBCRelation(jdbc:postgresql://localhost:5432/mydb,spark_emp,[Lorg.apache.spark.Partition;@3ff74546,{user=dbuser,
>  password=pwdata}) PushedFilter: [] [id#4,name#5]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11454) DB2 dialect - map DB2 ROWID and TIMESTAMP with TIMEZONE types into valid Spark types

2015-11-13 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004488#comment-15004488
 ] 

Suresh Thalamati commented on SPARK-11454:
--

I am looking into fixing this Jira  along with SPARK-10655 PR. 

> DB2 dialect - map DB2 ROWID and TIMESTAMP with TIMEZONE types into valid 
> Spark types
> 
>
> Key: SPARK-11454
> URL: https://issues.apache.org/jira/browse/SPARK-11454
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: Pallavi Priyadarshini
>Priority: Minor
>
> Load of DB2 data types (ROWID and TIMESTAMP with TIMEZONE) into Spark 
> DataFrames fails. 
> Plan is to map them to Spark IntegerType and TimestampType respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10521) Utilize Docker to test DB2 JDBC Dialect support

2015-11-10 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999799#comment-14999799
 ] 

Suresh Thalamati commented on SPARK-10521:
--

@Luciano, 

Jdbc  data sources docker tests  are re-enabled in PR : 
https://github.com/apache/spark/pull/9503


> Utilize Docker to test DB2 JDBC Dialect support
> ---
>
> Key: SPARK-10521
> URL: https://issues.apache.org/jira/browse/SPARK-10521
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Luciano Resende
>
> There was a discussion in SPARK-10170 around using a docker image to execute 
> the DB2 JDBC dialect tests. I will use this jira to work on providing the 
> basic image together with the test integration. We can then extend the 
> testing coverage as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11300) Support for string length when writing to JDBC

2015-10-30 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983631#comment-14983631
 ] 

Suresh Thalamati edited comment on SPARK-11300 at 10/30/15 11:59 PM:
-

Another related issue is SPARK-10849, Fix will  allow users to override data 
type  for any field. 


was (Author: tsuresh):
Another related issue is Spark-10849, Fix will  allow users to override data 
type  for any field. 

> Support for string length when writing to JDBC
> --
>
> Key: SPARK-11300
> URL: https://issues.apache.org/jira/browse/SPARK-11300
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Maciej Bryński
>
> Right now every StringType fields are written to JDBC as TEXT.
> I'd like to have option to write it as VARCHAR(size).
> Maybe we could use StringType(size) ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11300) Support for string length when writing to JDBC

2015-10-30 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983631#comment-14983631
 ] 

Suresh Thalamati commented on SPARK-11300:
--

Another related issue is Spark-10849, Fix will  allow users to override data 
type  for any field. 

> Support for string length when writing to JDBC
> --
>
> Key: SPARK-11300
> URL: https://issues.apache.org/jira/browse/SPARK-11300
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Maciej Bryński
>
> Right now every StringType fields are written to JDBC as TEXT.
> I'd like to have option to write it as VARCHAR(size).
> Maybe we could use StringType(size) ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10857) SQL injection bug in JdbcDialect.getTableExistsQuery()

2015-09-30 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937732#comment-14937732
 ] 

Suresh Thalamati commented on SPARK-10857:
--

One issue I ran into with getSchema() call was even if  Spark uses Java7  and 
above the JDBC driver versions  customers using may not have support for 
getSchema. 

I tried on couple of  databases I had , and got error one getSchema().   It is 
possible I have  old drivers. 
postgresql-9.3-1101-jdbc4.jar
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/bin/java -
Exception in thread "main" java.sql.SQLFeatureNotSupportedException: Method 
org.postgresql.jdbc4.Jdbc4Connection.getSchema() is not yet implemented.
at org.postgresql.Driver.notImplemented(Driver.java:729)
at 
org.postgresql.jdbc4.AbstractJdbc4Connection.getSchema(AbstractJdbc4Connection.java:239)

My SQL :
Implementation-Version: 5.1.17-SNAPSHOT
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/
Exception in thread "main" java.sql.SQLFeatureNotSupportedException: Not 
supported
at com.mysql.jdbc.JDBC4Connection.getSchema(JDBC4Connection.java:253)
...



> SQL injection bug in JdbcDialect.getTableExistsQuery()
> --
>
> Key: SPARK-10857
> URL: https://issues.apache.org/jira/browse/SPARK-10857
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Rick Hillegas
>Priority: Minor
>
> All of the implementations of this method involve constructing a query by 
> concatenating boilerplate text with a user-supplied name. This looks like a 
> SQL injection bug to me.
> A better solution would be to call java.sql.DatabaseMetaData.getTables() to 
> implement this method, using the catalog and schema which are available from 
> Connection.getCatalog() and Connection.getSchema(). This would not work on 
> Java 6 because Connection.getSchema() was introduced in Java 7. However, the 
> solution would work for more modern JVMs. Limiting the vulnerability to 
> obsolete JVMs would at least be an improvement over the current situation. 
> Java 6 has been end-of-lifed and is not an appropriate platform for users who 
> are concerned about security.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-27 Thread Suresh Thalamati
+1  (non-binding.)

Tested jdbc data source, and  some of the tpc-ds queries.


[jira] [Created] (SPARK-10849) Allow user to specify database column type for data frame fields when writing data to jdbc data sources.

2015-09-27 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-10849:


 Summary: Allow user to specify database column type for data frame 
fields when writing data to jdbc data sources. 
 Key: SPARK-10849
 URL: https://issues.apache.org/jira/browse/SPARK-10849
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.0
Reporter: Suresh Thalamati
Priority: Minor


Mapping data frame field type to database column type is addressed to large  
extent by  adding dialects, and Adding  maxlength option in SPARK-10101 to set 
the  VARCHAR length size. 

In some cases it is hard to determine max supported VARCHAR size , For example 
DB2 Z/OS VARCHAR size depends on the page size.  And some databases also has 
ROW SIZE limits for VARCHAR.  Specifying default CLOB for all String columns  
will likely make read/write slow. 

Allowing users to specify database type corresponding to the data frame field 
will be useful in cases where users wants to fine tune mapping for one or two 
fields, and is fine with default for all other fields .  

I propose to make the following two properties available for users to set in 
the data frame metadata when writing to JDBC data sources.
database.column.type  --  column type to use for create table.
jdbc.column.type" --  jdbc type to  use for setting null values. 

Example :
  val secdf = sc.parallelize( Array(("Apple","Revenue ..."), 
("Google","Income:123213"))).toDF("name", "report")

  val  metadataBuilder = new MetadataBuilder()
  metadataBuilder.putString("database.column.type", "CLOB(100K)")
  metadataBuilder.putLong("jdbc.type", java.sql.Types.CLOB)
  val metadta =  metadataBuilder.build()
  val secReportDF = secdf.withColumn("report", col("report").as("report", 
metadata))
  secReporrDF.write.jdbc("jdbc:mysql:///secdata", "reports", mysqlProps)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10849) Allow user to specify database column type for data frame fields when writing data to jdbc data sources.

2015-09-27 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909972#comment-14909972
 ] 

Suresh Thalamati commented on SPARK-10849:
--

I am working on creating pull request for this issue. 

> Allow user to specify database column type for data frame fields when writing 
> data to jdbc data sources. 
> -
>
> Key: SPARK-10849
> URL: https://issues.apache.org/jira/browse/SPARK-10849
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>    Reporter: Suresh Thalamati
>Priority: Minor
>
> Mapping data frame field type to database column type is addressed to large  
> extent by  adding dialects, and Adding  maxlength option in SPARK-10101 to 
> set the  VARCHAR length size. 
> In some cases it is hard to determine max supported VARCHAR size , For 
> example DB2 Z/OS VARCHAR size depends on the page size.  And some databases 
> also has ROW SIZE limits for VARCHAR.  Specifying default CLOB for all String 
> columns  will likely make read/write slow. 
> Allowing users to specify database type corresponding to the data frame field 
> will be useful in cases where users wants to fine tune mapping for one or two 
> fields, and is fine with default for all other fields .  
> I propose to make the following two properties available for users to set in 
> the data frame metadata when writing to JDBC data sources.
> database.column.type  --  column type to use for create table.
> jdbc.column.type" --  jdbc type to  use for setting null values. 
> Example :
>   val secdf = sc.parallelize( Array(("Apple","Revenue ..."), 
> ("Google","Income:123213"))).toDF("name", "report")
>   val  metadataBuilder = new MetadataBuilder()
>   metadataBuilder.putString("database.column.type", "CLOB(100K)")
>   metadataBuilder.putLong("jdbc.type", java.sql.Types.CLOB)
>   val metadta =  metadataBuilder.build()
>   val secReportDF = secdf.withColumn("report", col("report").as("report", 
> metadata))
>   secReporrDF.write.jdbc("jdbc:mysql:///secdata", "reports", mysqlProps)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10756) DataFrame write to teradata using jdbc not working, tries to create table each time irrespective of table existence

2015-09-22 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903100#comment-14903100
 ] 

Suresh Thalamati commented on SPARK-10756:
--

This issue is similar to https://issues.apache.org/jira/browse/SPARK-9078.  Fix 
might address  saving to Teradata also. Default table exists check query is 
changed to : "SELECT * FROM $table WHERE 1=0".


> DataFrame write to teradata using jdbc not working, tries to create table 
> each time irrespective of table existence
> ---
>
> Key: SPARK-10756
> URL: https://issues.apache.org/jira/browse/SPARK-10756
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Amit
>
> DataFrame write to teradata using jdbc not working, tries to create table 
> each time irrespective of table existence. 
> Whenever it goes to persist dataframe it checks for the table existence by 
> using query "SELECT 1 FROM $table LIMIT 1" and the keyword limit is not 
> supported in teradata. So the exception thrown by teradata for keyword is 
> understood as exception for table not exist in Spark and then Spark runs the 
> create table command irrespective of table was present. 
> So Create table command execution fails with the exception of table already 
> exist hence saving of data frame fails 
> Below is the method of JDBCUtils class
> /**
>* Returns true if the table already exists in the JDBC database.
>*/
>   def tableExists(conn: Connection, table: String): Boolean = {
> // Somewhat hacky, but there isn't a good way to identify whether a table 
> exists for all
> // SQL database systems, considering "table" could also include the 
> database name.
> Try(conn.prepareStatement(s"SELECT 1 FROM $table LIMIT 
> 1").executeQuery().next()).isSuccess
>   }
> In case of teradata, It returns false for every save/write operation 
> irrespective of the fact that table was present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10664) JDBC DataFrameWriter does not save data to Oracle 11 Database

2015-09-17 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803331#comment-14803331
 ] 

Suresh Thalamati commented on SPARK-10664:
--

Table exists case should be fixed as part of SPARK-9078 fix.  This one is fix 
is recently in the latest code line. 

> JDBC DataFrameWriter does not save data to Oracle 11 Database
> -
>
> Key: SPARK-10664
> URL: https://issues.apache.org/jira/browse/SPARK-10664
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Dmitriy Atorin
>Priority: Critical
>
> The issue is that Oracle 11 and less does not support LIMIT function.
> The issue is here:
> 1 . Go in org.apache.spark.sql.execution.datasources.jdbc
> 2. object JdbcUtils 
> 3.
> def tableExists(conn: Connection, table: String): Boolean = {
> // Somewhat hacky, but there isn't a good way to identify whether a table 
> exists for all
> // SQL database systems, considering "table" could also include the 
> database name.
> Try(conn.prepareStatement(s"SELECT 1 FROM $table LIMIT 
> 1").executeQuery().next()).isSuccess
>   }
> I think it is better to write in this way 
> s"SELECT count(*) FROM $table WHERE 1=0"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT

2015-09-16 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-10655:


 Summary: Enhance DB2 dialect to handle XML, and DECIMAL , and 
DECFLOAT
 Key: SPARK-10655
 URL: https://issues.apache.org/jira/browse/SPARK-10655
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.0
Reporter: Suresh Thalamati


Default type mapping does not work when reading from DB2 table that contains  
XML,  DECFLOAT  for READ , and DECIMAL type for write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT

2015-09-16 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791440#comment-14791440
 ] 

Suresh Thalamati commented on SPARK-10655:
--

I am working on pull request for this issue.

> Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
> -
>
> Key: SPARK-10655
> URL: https://issues.apache.org/jira/browse/SPARK-10655
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Suresh Thalamati
>
> Default type mapping does not work when reading from DB2 table that contains  
> XML,  DECFLOAT  for READ , and DECIMAL type for write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9078) Use of non-standard LIMIT keyword in JDBC tableExists code

2015-08-28 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720747#comment-14720747
 ] 

Suresh Thalamati commented on SPARK-9078:
-

@Bob, Reynold

I ran into the same issue when trying to write data frames into an existing 
table in DB2 database.
if you are not working on the fix, I would like to give it a try. 

Thank you for the analysis of the issue , my understanding is fix should do the 
following to address table exists problem with LIMIT syntax.:

-- Add table Exists method to the JdbcDialect interface, and allow dialects 
override the method as required for specific databases.
-- Default implementation of  table exists method should use 
DatabaseMetaData.getTables() to find if table exists. If that particular 
interface is not implement use  the query select 1 from $table where 1=0.
-- Add table exist method that use LIMIT query to  MySQL , and Postgres 
dialects.

* Enhancing registering of dialect :  (I think this may have to be separate 
Jira to avoid confusion).

@Reynold : I am not understanding your comment on adding  option to pass 
through  the jdbc data source. If you can give an example that will be great. 

Are you referring to some thing like the following ?
 df.write.option(datasource.jdbc.dialects 
org.apache.DerbyDialect).jdbc(jdbc:deryby://server:port/SAMPLE, emp, 
properties)


 Use of non-standard LIMIT keyword in JDBC tableExists code
 --

 Key: SPARK-9078
 URL: https://issues.apache.org/jira/browse/SPARK-9078
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1, 1.4.0
Reporter: Robert Beauchemin
Priority: Minor

 tableExists in  
 spark/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcUtils.scala uses 
 non-standard SQL (specifically, the LIMIT keyword) to determine whether a 
 table exists in a JDBC data source. This will cause an exception in many/most 
 JDBC databases that doesn't support LIMIT keyword. See 
 http://stackoverflow.com/questions/1528604/how-universal-is-the-limit-statement-in-sql
 To check for table existence or an exception, it could be recrafted around 
 select 1 from $table where 0 = 1 which isn't the same (it returns an empty 
 resultset rather than the value '1'), but would support more data sources and 
 also support empty tables. Arguably ugly and possibly queries every row on 
 sources that don't support constant folding, but better than failing on JDBC 
 sources that don't support LIMIT. 
 Perhaps supports LIMIT could be a field in the JdbcDialect class for 
 databases that support keyword this to override. The ANSI standard is (OFFSET 
 and) FETCH. 
 The standard way to check for table existence would be to use 
 information_schema.tables which is a SQL standard but may not work for other 
 JDBC data sources that support SQL, but not the information_schema. The JDBC 
 DatabaseMetaData interface provides getSchemas()  that allows checking for 
 the information_schema in drivers that support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10170) Writing from data frame into db2 database using jdbc data source api fails with error for string, and boolean column types.

2015-08-22 Thread Suresh Thalamati (JIRA)
Suresh Thalamati created SPARK-10170:


 Summary: Writing from data frame into db2 database using jdbc data 
source api fails with error for string, and boolean column types.
 Key: SPARK-10170
 URL: https://issues.apache.org/jira/browse/SPARK-10170
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1, 1.5.0
Reporter: Suresh Thalamati


Repro :
-- start spark shell with classpath set to the db2 jdbc driver. 
SPARK_CLASSPATH=~/myjars/db2jcc.jar ./spark-shell

 
// set connetion properties 
val properties = new java.util.Properties()
properties.setProperty(user , user)
properties.setProperty(password , password)

// load the driver.
Class.forName(com.ibm.db2.jcc.DB2Driver).newInstance

// create data frame with a String type
val empdf = sc.parallelize( Array((1,John), (2,Mike))).toDF(id, name )
// write the data frame.  this will fail with error.  
empdf.write.jdbc(jdbc:db2://bdvs150.svl.ibm.com:6/SAMPLE:retrieveMessagesFromServerOnGetMessage=true;,
 emp_data, properties)

Error :
com.ibm.db2.jcc.am.SqlSyntaxErrorException: TEXT
at com.ibm.db2.jcc.am.fd.a(fd.java:679)
at com.ibm.db2.jcc.am.fd.a(fd.java:60)
..


// create data frame with String , and Boolean types 
val empdf = sc.parallelize( Array((1,true.toBoolean ), (2, false.toBoolean 
))).toDF(id, isManager)
// write the data frame.  this will fail with error.  
empdf.write.jdbc(jdbc:db2://server:port 
/SAMPLE:retrieveMessagesFromServerOnGetMessage=true;, emp_data, properties)

Error :
com.ibm.db2.jcc.am.SqlSyntaxErrorException: TEXT
at com.ibm.db2.jcc.am.fd.a(fd.java:679)
at com.ibm.db2.jcc.am.fd.a(fd.java:60)

Write is failing because by default JDBC data source implementation generating 
table schema with unsupported data types TEXT  for String, and BIT1(1)  for 
Boolean. I think String type should get mapped to CLOB/VARCHAR, and boolean 
type should be mapped to CHAR(1) for DB2 database.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10170) Writing from data frame into db2 database using jdbc data source api fails with error for string, and boolean column types.

2015-08-22 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708124#comment-14708124
 ] 

Suresh Thalamati commented on SPARK-10170:
--

Scanning through the code I found out  there is 
org.apache.spark.sql.jdbcDialects  class that  nicely defines interface to 
handles data type differences in different databases and already has 
implementation for MySQL, and Postgres. Following the same approach, I added a 
DB2 dialect that maps String Type to CLOB, it fixed the issue I ran into.   I 
am working on submitting the patch.

 Writing from data frame into db2 database using jdbc data source api fails 
 with error for string, and boolean column types.
 ---

 Key: SPARK-10170
 URL: https://issues.apache.org/jira/browse/SPARK-10170
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1, 1.5.0
Reporter: Suresh Thalamati

 Repro :
 -- start spark shell with classpath set to the db2 jdbc driver. 
 SPARK_CLASSPATH=~/myjars/db2jcc.jar ./spark-shell
  
 // set connetion properties 
 val properties = new java.util.Properties()
 properties.setProperty(user , user)
 properties.setProperty(password , password)
 // load the driver.
 Class.forName(com.ibm.db2.jcc.DB2Driver).newInstance
 // create data frame with a String type
 val empdf = sc.parallelize( Array((1,John), (2,Mike))).toDF(id, name )
 // write the data frame.  this will fail with error.  
 empdf.write.jdbc(jdbc:db2://bdvs150.svl.ibm.com:6/SAMPLE:retrieveMessagesFromServerOnGetMessage=true;,
  emp_data, properties)
 Error :
 com.ibm.db2.jcc.am.SqlSyntaxErrorException: TEXT
   at com.ibm.db2.jcc.am.fd.a(fd.java:679)
   at com.ibm.db2.jcc.am.fd.a(fd.java:60)
 ..
 // create data frame with String , and Boolean types 
 val empdf = sc.parallelize( Array((1,true.toBoolean ), (2, 
 false.toBoolean ))).toDF(id, isManager)
 // write the data frame.  this will fail with error.  
 empdf.write.jdbc(jdbc:db2://server:port 
 /SAMPLE:retrieveMessagesFromServerOnGetMessage=true;, emp_data, properties)
 Error :
 com.ibm.db2.jcc.am.SqlSyntaxErrorException: TEXT
   at com.ibm.db2.jcc.am.fd.a(fd.java:679)
   at com.ibm.db2.jcc.am.fd.a(fd.java:60)
 Write is failing because by default JDBC data source implementation 
 generating table schema with unsupported data types TEXT  for String, and 
 BIT1(1)  for Boolean. I think String type should get mapped to CLOB/VARCHAR, 
 and boolean type should be mapped to CHAR(1) for DB2 database.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Ole Solberg as a committer

2009-06-04 Thread Suresh Thalamati
+1

On Thu, May 28, 2009 at 7:13 AM, Rick Hillegas richard.hille...@sun.comwrote:

 Please vote on whether we should make Ole Solberg a committer. The vote
 closes at 5:00 pm San Francisco time on Thursday June 4.

 For 4+ years Ole has been a significant contributor to Derby. We all rely
 heavily on Ole's failure analysis and on the tests and testing
 infrastructure which Ole continues to extend. Let's make Ole even more
 productive by minting him as a committer.

 Regards,
 -Rick



Re: [VOTE] John H Embretsen as a Derby committer

2008-04-01 Thread Suresh Thalamati
+1


On Wed, Mar 26, 2008 at 9:13 AM, Daniel John Debrunner [EMAIL PROTECTED]
wrote:

 John is actively involved on both the derby-dev and derby-user lists and
 fully engages in open development. He has had a number of patches
 committed, most recently taking the stalled JMX work and getting it into
 a shape where it could be committed to allow others to get involved.

 Vote closes 2007-04-02 16:00 PDT

 Dan.



[jira] Created: (DERBY-3367) Sort is not avoided even when the has an index on a the column being ordered, for a query with id != -1 predicate.

2008-01-30 Thread Suresh Thalamati (JIRA)
Sort is not avoided even when the has an index on a the column being ordered,  
for a query with id != -1 predicate.
---

 Key: DERBY-3367
 URL: https://issues.apache.org/jira/browse/DERBY-3367
 Project: Derby
  Issue Type: Improvement
  Components: SQL
Affects Versions: 10.3.2.1
Reporter: Suresh Thalamati
 Attachments: derby.log

Sort is not avoided even when the has an index on a the column being ordered, 

Repro:

go.ddl:
---

connect 'jdbc:derby:testdb;create=true';

create table t1 (i int, j int, vc varchar(30));
insert into t1 values (1, -1, 'minus one');
insert into t1 values (2, 2, 'two'), (3, 3, 'trois'), (3, -3, 'minus three'), 
(4, 4, 'four');

insert into t1 select * from t1 where j  0;
insert into t1 select * from t1 where j  0;
insert into t1 select * from t1 where j  0;
insert into t1 select * from t1 where j  0;
insert into t1 select * from t1 where j  0;
insert into t1 select * from t1 where j  0;
insert into t1 select * from t1 where j  0;
insert into t1 select * from t1 where j  0;
insert into t1 select * from t1 where j  0;
insert into t1 select * from t1 where j  0;

create index ix on t1 (j);
disconnect all;

exit;

go.sql:
---

connect 'jdbc:derby:testdb';

get cursor c1 as 'select j, vc from t1 order by j asc';
next c1;
close c1;

get cursor c1 as 'select j, vc from t1 where j != -1 order by j asc';
next c1;
close c1;



--

After running go.sql, if you look at the derby.log file you'll see that the 
query with no predicate does an index scan and only has to read 1 row from disk 
before the cursor is closed.  But the query _with_ a predicate does a table
scan an has to read 3074 rows from disk, and sort them, just to return 
the first one in the result set. 

In the repro, it looks fast. But If the data is large, 
which was the case in my  application.  
The table was: 
create table t2 (i int, j int, vc varchar(15000)); 
and loaded with 13000 rows. It takes almost minute to get the first row ,
for the query select j, vc from t1 where j != -1 order by j asc'












-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (DERBY-3367) Sort is not avoided even when the has an index on a the column being ordered, for a query with id != -1 predicate.

2008-01-30 Thread Suresh Thalamati (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Thalamati updated DERBY-3367:


Attachment: derby.log

Derby log with the query plans.


 Sort is not avoided even when the has an index on a the column being ordered, 
  for a query with id != -1 predicate.
 ---

 Key: DERBY-3367
 URL: https://issues.apache.org/jira/browse/DERBY-3367
 Project: Derby
  Issue Type: Improvement
  Components: SQL
Affects Versions: 10.3.2.1
Reporter: Suresh Thalamati
 Attachments: derby.log


 Sort is not avoided even when the has an index on a the column being ordered, 
 Repro:
 go.ddl:
 ---
 connect 'jdbc:derby:testdb;create=true';
 create table t1 (i int, j int, vc varchar(30));
 insert into t1 values (1, -1, 'minus one');
 insert into t1 values (2, 2, 'two'), (3, 3, 'trois'), (3, -3, 'minus three'), 
 (4, 4, 'four');
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 create index ix on t1 (j);
 disconnect all;
 exit;
 go.sql:
 ---
 connect 'jdbc:derby:testdb';
 get cursor c1 as 'select j, vc from t1 order by j asc';
 next c1;
 close c1;
 get cursor c1 as 'select j, vc from t1 where j != -1 order by j asc';
 next c1;
 close c1;
 --
 After running go.sql, if you look at the derby.log file you'll see that the 
 query with no predicate does an index scan and only has to read 1 row from 
 disk 
 before the cursor is closed.  But the query _with_ a predicate does a table
 scan an has to read 3074 rows from disk, and sort them, just to return 
 the first one in the result set. 
 In the repro, it looks fast. But If the data is large, 
 which was the case in my  application.  
 The table was: 
 create table t2 (i int, j int, vc varchar(15000)); 
 and loaded with 13000 rows. It takes almost minute to get the first row ,
 for the query select j, vc from t1 where j != -1 order by j asc'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3367) Sort is not avoided even when the has an index on a the column being ordered, for a query with id != -1 predicate.

2008-01-30 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564205#action_12564205
 ] 

Suresh Thalamati commented on DERBY-3367:
-

Thanks for your comments, Mike. I am not looking for Derby to optimize time to  
return the first row, The application needs all the rows ,  returned by the 
query.   But it process the rows as it 
gets and  shows results iteratively to the user.  Because of sorting, it takes 
time to get the 
first row also ,  which makes it look as of the application is hung.

My observation was, even in IJ ,   selecting all the rows  without the  != -1  
qualifier was 
faster than  with the  qualifier.

I was also surprised it used  index  without the qualifier., but not with the 
qualifier.  
It turns out to be  good decision  by the optimizer , if the sorting is 
external  
and spilling to  disk, when  data size is large. 





 Sort is not avoided even when the has an index on a the column being ordered, 
  for a query with id != -1 predicate.
 ---

 Key: DERBY-3367
 URL: https://issues.apache.org/jira/browse/DERBY-3367
 Project: Derby
  Issue Type: Improvement
  Components: SQL
Affects Versions: 10.3.2.1
Reporter: Suresh Thalamati
 Attachments: derby.log


 Sort is not avoided even when the has an index on a the column being ordered, 
 Repro:
 go.ddl:
 ---
 connect 'jdbc:derby:testdb;create=true';
 create table t1 (i int, j int, vc varchar(30));
 insert into t1 values (1, -1, 'minus one');
 insert into t1 values (2, 2, 'two'), (3, 3, 'trois'), (3, -3, 'minus three'), 
 (4, 4, 'four');
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 insert into t1 select * from t1 where j  0;
 create index ix on t1 (j);
 disconnect all;
 exit;
 go.sql:
 ---
 connect 'jdbc:derby:testdb';
 get cursor c1 as 'select j, vc from t1 order by j asc';
 next c1;
 close c1;
 get cursor c1 as 'select j, vc from t1 where j != -1 order by j asc';
 next c1;
 close c1;
 --
 After running go.sql, if you look at the derby.log file you'll see that the 
 query with no predicate does an index scan and only has to read 1 row from 
 disk 
 before the cursor is closed.  But the query _with_ a predicate does a table
 scan an has to read 3074 rows from disk, and sort them, just to return 
 the first one in the result set. 
 In the repro, it looks fast. But If the data is large, 
 which was the case in my  application.  
 The table was: 
 create table t2 (i int, j int, vc varchar(15000)); 
 and loaded with 13000 rows. It takes almost minute to get the first row ,
 for the query select j, vc from t1 where j != -1 order by j asc'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-700) Derby does not prevent dual boot of database from different classloaders on Linux

2007-06-14 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504943
 ] 

Suresh Thalamati commented on DERBY-700:


I agree with Dan,  getLockedFile() is confusing and should not be added 
to the storage factory interfaces , if it can be avoided or  have better 
comments.
 
Just thought I will take a moment and explain, why in the first place 
I added this method, I hope it will help in finding an alternative solution 
or make the interface better. 

I was testing and developing  my solution on Windows. When I 
first implemented without adding the getLockedFile() method, by
just getting the RandomAccessFile using StorageFile.getRandomAccessFile(), 
after it is locked. I was hitting the error , java.io.IOException: 

The process cannot access the file because another process has locked a 
portion of the file. When it write UUID to the dbex.lck file. 


After bit of debugging realized , I need access to the same 
RandomAccesFile, that is used to the get the lock. So I simply
added the getLockedFile, which just return the same RandomAccessFile, 
that is used to get the file lock.  


public class writeLock {
public static void main(String[] args) throws Exception 
{
File lf = new File(dbex.lck);
RandomAccessFile raf1 = new RandomAccessFile(lf, rw);
FileChannel fc = raf1.getChannel();
FileLock lock = fc.tryLock();

// attempt to write to locked file using another
//
RandomAccessFile raf2 =  new RandomAccessFile(lf , rw);
// the following write is failing on windows.
raf2.writeChars(Derby is great );
}
}

For example , in the above code. Last write will fail with the error:
Exception in thread main java.io.IOException: The process cannot access the 
file because another process has locked a portion of the file. 

I did not verify, if this is the case in the non-window environment too. This
fix is mainly  for non-window environment, so If on other platforms this is 
not issue then  the getLockedFile() can be replaced with the 
StorageFile.getRandomAccesFile(). My concern is, How to confirm all 
non-window environments will not give the above error. 

First I thought of   putting the new intraJVM lock code in the storage
factory, where getExclusiveFileLock() is implemented, For me 
It looked worse alternative too  getLockedFile() method, because then 
getExlusiveFileLock() method semantics gets more confusing. 

Another solution I was thinking that may work without adding the 
*getLockedFile() is  to use the range lock. like FileLock lock = 
fc.tryLock(0, 10 ,false), and write to the file after the 10th byte. 
This may need a new method getExclusiveLock() , which takes range.

May be simple solution is to add a new call, getExclusiveFileLock() 
method that takes RandomAcceesFile as the argument.

To start with the reason we are in this mess is the getExclusiveLock() 
implementation is not matching the way java interfaces work. Ideally 
store code should hold on to the RandomAccessFile used for the lock,
not the storage factory implementraion. Ideally StorageRandomAccessFile 
should give handle to the StorageFileChannel, which has the tryLock() 
method, that returns FileLock.  


Thanks
-suresh



 Derby does not prevent dual boot of database from different classloaders on 
 Linux
 -

 Key: DERBY-700
 URL: https://issues.apache.org/jira/browse/DERBY-700
 Project: Derby
  Issue Type: Bug
  Components: Store
Affects Versions: 10.1.2.1
 Environment: ava -version
 java version 1.4.2_08
 Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_08-b03)
 Java HotSpot(TM) Client VM (build 1.4.2_08-b03, mixed mode)
Reporter: Kathey Marsden
Assignee: Kathey Marsden
Priority: Critical
 Fix For: 10.3.0.0

 Attachments: DERBY-700.diff, DERBY-700.stat, 
 derby-700_06_07_07_diff.txt, derby-700_06_07_07_stat.txt, derby-700_diff.txt, 
 derby-700_stat.txt, DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.diff, 
 DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.stat, 
 derby-700_with_NPE_fix_diff.txt, derby-700_with_NPE_fix_stat.txt, derby.log, 
 derby700_singleproperty_v1.diff, derby700_singleproperty_v1.stat, 
 DualBootRepro.java, DualBootRepro2.zip, DualBootRepro_mutltithreaded.tar.bz2, 
 releaseNote.html


 Derby does not prevent dual boot from two different classloaders on Linux.
 To reproduce run the  program DualBootRepro with no derby jars in your 
 classpath. The program assumes derby.jar is in 10.1.2.1/derby.jar, you can 
 change the location by changing the DERBY_LIB_DIR variable.
 On Linux the output is:
 $java -cp . DualBootRepro
 Loading derby from file:10.1.2.1/derby.jar
 10.1.2.1/derby.jar
 Booted

Re: 10.3 release coming up...fast!

2007-05-24 Thread Suresh Thalamati

Myrna van Lunteren wrote:

Hi!

We have now about 7 days before the code complete date of 6/1/07!



Thanks for volunteering to be release manager, Myrna.
6/1 sounds good to me. I am not planning do any more
checkins, for the 10.3 release.

Thanks
-suresh



[jira] Updated: (DERBY-378) support for import/export of tables with clob/blob and the other binary data types will be good addition to derby,

2007-05-23 Thread Suresh Thalamati (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Thalamati updated DERBY-378:
---

Fix Version/s: 10.3.0.0

 support for  import/export  of  tables with clob/blob and the other binary 
 data types   will be good addition to derby,
 ---

 Key: DERBY-378
 URL: https://issues.apache.org/jira/browse/DERBY-378
 Project: Derby
  Issue Type: Improvement
  Components: Tools
Affects Versions: 10.1.1.0
Reporter: Suresh Thalamati
 Assigned To: Suresh Thalamati
 Fix For: 10.3.0.0

 Attachments: derby378_1.diff, derby378_1.stat, derby378_2.diff, 
 derby378_2.stat, derby378_3.diff, derby378_3.stat, derby378_4.diff, 
 derby378_4.stat, derby378_5.diff, derby378_6.diff, iexlobs.txt, iexlobs_v1.txt


 Currently if  I have  a table that contains clob/blob column,  import/export 
 operations on that table
 throghs  unsupported feature exception. 
 set schema iep;
 set schema iep;
 create table ntype(a int , ct CLOB(1024));
 create table ntype1(bt BLOB(1024) , a int);
 call SYSCS_UTIL.SYSCS_EXPORT_TABLE ('iep', 'ntype' , 'extinout/ntype.dat' ,
  null, null, null) ;
 ERROR XIE0B: Column 'CT' in the table is of type CLOB, it is not supported by 
 th
 e import/export feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (DERBY-378) support for import/export of tables with clob/blob and the other binary data types will be good addition to derby,

2007-05-23 Thread Suresh Thalamati (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Thalamati resolved DERBY-378.


Resolution: Fixed

 support for  import/export  of  tables with clob/blob and the other binary 
 data types   will be good addition to derby,
 ---

 Key: DERBY-378
 URL: https://issues.apache.org/jira/browse/DERBY-378
 Project: Derby
  Issue Type: Improvement
  Components: Tools
Affects Versions: 10.1.1.0
Reporter: Suresh Thalamati
 Assigned To: Suresh Thalamati
 Fix For: 10.3.0.0

 Attachments: derby378_1.diff, derby378_1.stat, derby378_2.diff, 
 derby378_2.stat, derby378_3.diff, derby378_3.stat, derby378_4.diff, 
 derby378_4.stat, derby378_5.diff, derby378_6.diff, iexlobs.txt, iexlobs_v1.txt


 Currently if  I have  a table that contains clob/blob column,  import/export 
 operations on that table
 throghs  unsupported feature exception. 
 set schema iep;
 set schema iep;
 create table ntype(a int , ct CLOB(1024));
 create table ntype1(bt BLOB(1024) , a int);
 call SYSCS_UTIL.SYSCS_EXPORT_TABLE ('iep', 'ntype' , 'extinout/ntype.dat' ,
  null, null, null) ;
 ERROR XIE0B: Column 'CT' in the table is of type CLOB, it is not supported by 
 th
 e import/export feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-700) Derby does not prevent dual boot of database from different classloaders on Linux

2007-05-22 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498050
 ] 

Suresh Thalamati commented on DERBY-700:


[ Show » ] Kathey Marsden [22/May/07 03:22 PM] Thanks Suresh for the patch! I 
was wondering, why do we need 
setContextClassLoader privileges for derby.jar with the change? 

I must have added that permission while debugging to run the test under 
security manager. 
Only security permission that is required for derby.jar  is read/write  of 
derby.storage.jvmInstanceID 
property.

Thanks for for volunteering to fix the bug, Kathey.
.


 Derby does not prevent dual boot of database from different classloaders on 
 Linux
 -

 Key: DERBY-700
 URL: https://issues.apache.org/jira/browse/DERBY-700
 Project: Derby
  Issue Type: Bug
  Components: Store
Affects Versions: 10.1.2.1
 Environment: ava -version
 java version 1.4.2_08
 Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_08-b03)
 Java HotSpot(TM) Client VM (build 1.4.2_08-b03, mixed mode)
Reporter: Kathey Marsden
 Assigned To: Kathey Marsden
Priority: Critical
 Attachments: DERBY-700.diff, DERBY-700.stat, 
 DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.diff, 
 DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.stat, 
 derby700_singleproperty_v1.diff, derby700_singleproperty_v1.stat, 
 DualBootRepro.java, DualBootRepro2.zip, DualBootRepro_mutltithreaded.tar.bz2


 Derby does not prevent dual boot from two different classloaders on Linux.
 To reproduce run the  program DualBootRepro with no derby jars in your 
 classpath. The program assumes derby.jar is in 10.1.2.1/derby.jar, you can 
 change the location by changing the DERBY_LIB_DIR variable.
 On Linux the output is:
 $java -cp . DualBootRepro
 Loading derby from file:10.1.2.1/derby.jar
 10.1.2.1/derby.jar
 Booted database in loader [EMAIL PROTECTED]
 FAIL: Booted database in 2nd loader [EMAIL PROTECTED]
 On Windows I get the expected output.
 $ java -cp . DualBootRepro
 Loading derby from file:10.1.2.1/derby.jar
 10.1.2.1/derby.jar
 Booted database in loader [EMAIL PROTECTED]
 PASS: Expected exception for dualboot:Another instance of Derby may have 
 already booted the database D:\marsden\repro\dualboot\mydb.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-2527) Add documentation for import/export of LOBS and other binary data types.

2007-05-18 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497107
 ] 

Suresh Thalamati commented on DERBY-2527:
-

Thanks for addressing my comments, Laura. I read through your comments,
looks like you need my input only for this one:

We still need an example for
SYSCS_UTIL.SYSCS_EXPORT_QUERY_LOBS_TO_EXTFILE 

Yes, it is similar to other SYSCS_EXPORT_QUERY procedures, But I agree with you
it will good to have an example for this one too. 


Example exporting data from a query, using a separate export file for the LOB
data

The following example shows how to export employee data in department 20 from 
the
STAFF table in a sample database to the main file staff.del and the lob
data to the file pictures.dat.


CALL SYSCS_UTIL.SYSCS_EXPORT_QUERY_LOBS_TO_EXTFILE(
 'SELECT * FROM STAFF WHERE dept=20',
 'c:\data\staff.del', ',','','UTF-8','c:\data\pictures.dat');


 Add documentation for  import/export  of LOBS and other binary data types. 
 ---

 Key: DERBY-2527
 URL: https://issues.apache.org/jira/browse/DERBY-2527
 Project: Derby
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 10.3.0.0
Reporter: Suresh Thalamati
 Assigned To: Laura Stewart
 Attachments: derby2527_1.diff, derbytools.pdf, derbytools.pdf, 
 iexlobs_v1.txt, refderby.pdf, refderby.pdf




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-2527) Add documentation for import/export of LOBS and other binary data types.

2007-05-18 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497112
 ] 

Suresh Thalamati commented on DERBY-2527:
-

Thanks Laura, I reviewed the import/export related sections in the attached
pdf files, changes looks good. Please commit the changes. 

Some Minor things I noticed when I was reading through the doc again :

Derby tools Guide :

1) 

On Page 46 :

It says , in the second section : The import and export procedures read and
write only text files   not true any more , import/export writes and reads 
binary data from the external lob file. 

Please remove that sentence or change it to the following :

The import and export procedures read and write only text files, except when 
blob data is imported or exported using an external file.
  

2) on Page 47 :  section :Bulk import and export requirements and 
considerations:


Restrictions on delimiters
You cannot specify Hex decimal characters (0-9, a-f, A-F) as delimiters for the 
import
and export procedures

you may want to remove the above , I don't want users to think that is the only
delimiter restriction. 

This restriction is already listed under File format for input and output on 
page page 49 :( Delimiters cannot be hex decimal characters (0-9, a-f, A-F).)
along with other delimiter restrictions. 

3) I really like the way you listed procedures. One minor thing I noticed is : 

a) 

On page 50 :

1. Choose the correct procedure for the type of import that you want to perform:

and 

For examples using these procedures, see Examples of bulk import and export.
Derby Tools and Utilities Guide
52

looks out of place. 


you may want to have  heading as :  Import Procedures 

and the below/above the table , write something like:

Choose the correct procedure for the type of import that you want to perform
from the  table. For examples using these procedures, 
see Examples of bulk import and export.Derby Tools and Utilities Guide


b) on page 52 , please do the same for export procedures.




Derby Reference Guide :


On page 126: SYSCS_UTIL.SYSCS_EXPORT_TABLE_LOBS_TO_EXTFILE  example:

CALL SYSCS_UTIL.SYSCS_EXPORT_TABLE_LOBS_TO_EXTFILE(
'APP', 'STAFF', 'c:\data\staff.del', ', ', '',
'UTF-8', 'c:\data\pictures.dat')

delete the space in the column delimiter parameter value : ', '   
it shoud be  just  ','


Thanks
-suresh


 Add documentation for  import/export  of LOBS and other binary data types. 
 ---

 Key: DERBY-2527
 URL: https://issues.apache.org/jira/browse/DERBY-2527
 Project: Derby
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 10.3.0.0
Reporter: Suresh Thalamati
 Assigned To: Laura Stewart
 Attachments: derby2527_1.diff, derbytools.pdf, derbytools.pdf, 
 iexlobs_v1.txt, refderby.pdf, refderby.pdf




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (DERBY-700) Derby does not prevent dual boot of database from different classloaders on Linux

2007-05-17 Thread Suresh Thalamati (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Thalamati updated DERBY-700:
---

Attachment: derby700_singleproperty_v1.stat
derby700_singleproperty_v1.diff

Attached is a patch with partial implementation of  the intra-jvm db 
lock mechanism to prevent users from running multiple instances of the same
database using different class loaders in the same jvm. Existing 
solution already prevents users from running multiple instances across jvms.

Although I have not assigned this issue to myself, have been working on this
issue on/off for some time. But I don't have free cycles to work on this bug 
for 
the upcoming release. I am posting the patch with whatever work I have done
sofar. If some one can enhance the attached patch and fix this issue, 
that will be great.

Intra-jvm db lock is provided by using global derby specific jvm instance id.
On a first boot of any derby database in a jvm, an UUID is generated and 
stored in a system property (derby.storage.jvmInstanceID). This id is written 
to the dbex.lck file when the database is booted. If the ID in the dbex.lck 
file 
and the current jvm id stored in the system property
derby.storage.jvmInstanceID are same then the database is already booted, and 
an exception will be thrown and database will not be booted. On a database
shutdown , an Invalid JVM id is written to the dbex.lck file, so that if the 
database booted again ID's will not be equal, which means database is not
already booted, the database will be booted successfully. I am using the 
existing UUID factory in the derby to generate the derby jvm id.

Synchronization is done by using the interned strings. Synchronization 
across the JVM is done on derby.storage.jvmInstanceID string. And 
Synchronization for a database is done on the database directory. This 
may need to be changed to a database name, because it may not be possible 
to get the canonical path always and it is necessary to synchronize on 
a  string, that is unique to a database in a jvm. 
 

In my earlier proposed solution I mentioned about releasing the db 
locks using finalizer. After doing some testing , I realized there is 
no need to address unlocking the database on garbage collection, 
using the finalizer. My understanding is unless users shutdown 
the database rawStoreDaemon and antiGC thread will hold on to the 
resources and the classes will not be unloaded. So the only way users
can boot an already booted database but not in use using another class 
loader instance in the same jvm is by doing a shutdown from the class 
loader that booted the database. If some thinks this is not 
true,  please correct me. 

To do :

1) cleanup error handling on IOExceptions and add new message for the 
   intra-jvm db lock. 

2) Currently dataDirectory path string is is used for  synchronization to 
   prevent multiple boots of a database. This may need to be changed to 
   the db name.

3) Make the classLoaderBootTest.java run under security manager. 

4)  Add test cases for booting different databases parallelly on different 
threads with different class loaders. May not be really required , because
even booting a single database through different threads shoud test 
the same thing. But it may be better to add a test case, just to be safe!
 

5) Run the test with large number of threads.

6) Anything else I have forgotten !
  
 
Thanks 
-suresh



 Derby does not prevent dual boot of database from different classloaders on 
 Linux
 -

 Key: DERBY-700
 URL: https://issues.apache.org/jira/browse/DERBY-700
 Project: Derby
  Issue Type: Bug
  Components: Store
Affects Versions: 10.1.2.1
 Environment: ava -version
 java version 1.4.2_08
 Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_08-b03)
 Java HotSpot(TM) Client VM (build 1.4.2_08-b03, mixed mode)
Reporter: Kathey Marsden
Priority: Critical
 Attachments: DERBY-700.diff, DERBY-700.stat, 
 DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.diff, 
 DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.stat, 
 derby700_singleproperty_v1.diff, derby700_singleproperty_v1.stat, 
 DualBootRepro.java, DualBootRepro2.zip, DualBootRepro_mutltithreaded.tar.bz2


 Derby does not prevent dual boot from two different classloaders on Linux.
 To reproduce run the  program DualBootRepro with no derby jars in your 
 classpath. The program assumes derby.jar is in 10.1.2.1/derby.jar, you can 
 change the location by changing the DERBY_LIB_DIR variable.
 On Linux the output is:
 $java -cp . DualBootRepro
 Loading derby from file:10.1.2.1/derby.jar
 10.1.2.1/derby.jar
 Booted database in loader [EMAIL PROTECTED]
 FAIL: Booted database in 2nd loader [EMAIL PROTECTED

[jira] Created: (DERBY-2649) An unsuccessful boot attempt of an booted database can potentially delete files in the temp directory that are in use.

2007-05-15 Thread Suresh Thalamati (JIRA)
An unsuccessful boot attempt of an booted database can potentially delete files 
in the temp directory that are in use. 
---

 Key: DERBY-2649
 URL: https://issues.apache.org/jira/browse/DERBY-2649
 Project: Derby
  Issue Type: Bug
  Components: Store
Affects Versions: 10.2.2.0
Reporter: Suresh Thalamati


Lock to prevent multi-jvm boot is acquired after the temp directory is cleaned 
up in BaseDataFileDirectory.java boot() method.   Because lock is acquired 
later ,  an unsuceessfule boot attempt could potentiall delete file in the
temp directory that are in use. 

See : BaseDataFileDirectory.java : boot()
   storageFactory =
ps.getStorageFactoryInstance(
true,
dataDirectory,
startParams.getProperty(
Property.STORAGE_TEMP_DIRECTORY,
PropertyUtil.getSystemProperty(
Property.STORAGE_TEMP_DIRECTORY)),
identifier.toANSIidentifier());

Above call to get the storage factory seems to cleanup the temp directory, and 
the method is invode before calling the 
the method that prevents multi-jvm boot of an database. 

if (!isReadOnly())  // read only db, not interested 
in filelock
getJBMSLockOnDB(identifier, uf, dataDirectory);




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-2020) Change file option for syncing log file to disk from rws to rwd

2007-04-23 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491013
 ] 

Suresh Thalamati commented on DERBY-2020:
-

Thanks for addressing my comments , Olav.  The latest patch looks good. 


 Change file option for syncing log file to disk from rws to rwd
 ---

 Key: DERBY-2020
 URL: https://issues.apache.org/jira/browse/DERBY-2020
 Project: Derby
  Issue Type: Improvement
  Components: Performance, Store
Affects Versions: 10.3.0.0
Reporter: Olav Sandstaa
 Assigned To: Olav Sandstaa
 Attachments: disk-cache.png, jvmsyncbug.diff, jvmsyncbug.stat, 
 jvmsyncbug_v2.diff, jvmsyncbug_v2.stat, jvmsyncbug_v3.diff, 
 jvmsyncbug_v3.stat, no-disk-cache.png, rwd.diff, rwd.stat


 For writing the transaction log to disk Derby uses a
 RandomAccessFile. If it is supported by the JVM, the log files are
 opened in rws mode making the file system take care of syncing
 writes to disk. rws mode will ensure that both the data and the file
 meta-data is updated for every write to the file. On some operating
 systems (e.g. Solaris) this leads to two write operation to the disk
 for every write issued by Derby. This is limiting the throughput of
 update intensive applications.  If we could change the file mode to
 rwd this could reduce the number of updates to the disk.
 I have run some simple tests where I have changed mode from rws to
 rwd for the Derby log file. When running a small numbers of
 concurrent client threads the throughput is almost doubled and the
 response time is almost halved. I will attach some graphs that show
 this when running a given number of concurrent tpc-b like clients. These
 graphs show the throughput when running with rws and rwd mode when the
 disk's write cache has been enabled and disabled.
 I am creating this Jira to have a place where we can collect
 information about issues both for and against changing the default
 mode for writing to log files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-2527) Add documentation for import/export of LOBS and other binary data types.

2007-04-12 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488505
 ] 

Suresh Thalamati commented on DERBY-2527:
-

Thanks Laura. My comments are below for the questions you posted. 

1) 
 The phrase performs single inserts ... should that be performs single row
inserts ??? 

yes. performs single row inserts is better. 


2)
In the list of arguments, there is this text on many of the parameters 
Passing a null will result in an error. 
The current text might be confusing since other parameters allow a NULL 
value. 
I propose that we change it to: 
 Omitting this parameter or passing a NULL value will result in a error 
Is this new phrasing accurate? 

Omitting a parameter will result an error for all the system procedures. I think
it is not necessary to explicitly say that for some cases. 
 

3) 
 I'm confused about the import examples. Are the insertColumns and
columnIndexes arguments optional? 

No. There are required arguments when using SYSCS_UTIL.SYSCS_IMPORT_DATA(..)
or SYSCS_UTIL.SYSCS_IMPORT_DATA_LOBS_FROM_EXTFILE(...)  .

They are not arguments for  SYSCS_UTIL.SYSCS_IMPORT_TABLE() and 
SYSCS_UTIL.SYSCS_IMPORT_TABLE_LOBS_FROM_EXTFILE(..)

My understanding of the doc in the tools guide is, user
follows the syntax defined for the procedures and finds the explanation for 
the arguments in the Arguments to import procedure page(
http://db.apache.org/derby/docs/10.2/tools/rtoolsimport64241.html) depending 
on what procedure he/she is using. 


If so, then we should state that in the topic
the describes the arguments. If not, then the examples need to be
updated. Please advise :-) 

If you find any examples, that are not correct. Please let me know, 
I will verify them. 


 Add documentation for  import/export  of LOBS and other binary data types. 
 ---

 Key: DERBY-2527
 URL: https://issues.apache.org/jira/browse/DERBY-2527
 Project: Derby
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 10.3.0.0
Reporter: Suresh Thalamati
 Assigned To: Laura Stewart
 Attachments: iexlobs_v1.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-2527) Add documentation for import/export of LOBS and other binary data types.

2007-04-12 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488527
 ] 

Suresh Thalamati commented on DERBY-2527:
-

#2 :  I agree with you. Changging the sentence to Passing a NULL value will 
result in an error. makes it more clear.  
  
#3. I will post an example using 
SYSCS_UTIL.SYSCS_IMPORT_DATA_LOBS_FROM_EXTFILE(...) 
[

 Add documentation for  import/export  of LOBS and other binary data types. 
 ---

 Key: DERBY-2527
 URL: https://issues.apache.org/jira/browse/DERBY-2527
 Project: Derby
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 10.3.0.0
Reporter: Suresh Thalamati
 Assigned To: Laura Stewart
 Attachments: iexlobs_v1.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-2527) Add documentation for import/export of LOBS and other binary data types.

2007-04-10 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487958
 ] 

Suresh Thalamati commented on DERBY-2527:
-

Thanks for volunteering to write the documentation , Laura.  My comments are
below for the questions you posted. 


1) Regarding null arguments.
 
yes. if user passes null value for any of the argument for import/export 
procedures , 
default value is used.  If user passes null as argument for schema
name, current schema is used.  Default value for  for column delimiter 
is comma (,)   and the default value for character delimiter is double
quote(). Default for the codeset depends on the environment user has started 
the jvm. 


It might be better to have an example that does not have null, to be more
clear. Please use the following example instead of the ones in the spec. 
 

CALL SYSCS_UTIL.SYSCS_EXPORT_TABLE_LOBS_TO_EXTFILE('APP','STAFF',
'c:\data\staff.del',',','','UTF-8', 'c:\data\pictures.dat');


CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE_LOBS_FROM_EXTFILE(
   'APP','STAFF','c:\data\staff.del',',','','UTF-8',0);


2)Reading how to document  definition of the procedures.

This is not my creativity, I have just been following the same format as
other ones already in the documentation. 

 Laur wrote:
SYSCS_UTIL.SYSCS_IMPORT_DATA (IN SCHEMANAME VARCHAR(128),
IN TABLENAME VARCHAR(128), IN INSERTCOLUMNS VARCHAR(32672),
IN COLUMNINDEXES VARCHAR(32672), IN FILENAME VARCHAR(32672),
IN COLUMNDELIMITER CHAR(1), IN CHARACTERDELIMITER CHAR(1),
 IN CODESET VARCHAR(128), IN REPLACE SMALLINT)

The syntax should appear in the docs as:

 SYSCS_UTIL.SYSCS_IMPORT_DATA(
SCHEMANAME, TABLENAME, INSERTCOLUMNS, COLUMNINDEXES, FILENAME, 
 COLUMNDELIMITER,
CHARACTERDELIMITER, CODESET, REPLACE
 )

No. It can not be that simple.  Type of the arguments needs to be documented. 
IN  is to  indicate that it is an input parameter.  Procedure can have 
OUT or INOUT type parameters. I find the above definition useful , just 
by looking at the syntax , I can set the correct parameters instead of reading 
through the whole doc. 


May be the current way of documenting is not the best approach. If you have
some ideas to improve it, please file a separate jira. May be others will 
have some opinions. 
 

 Add documentation for  import/export  of LOBS and other binary data types. 
 ---

 Key: DERBY-2527
 URL: https://issues.apache.org/jira/browse/DERBY-2527
 Project: Derby
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 10.3.0.0
Reporter: Suresh Thalamati
 Assigned To: Laura Stewart
 Attachments: iexlobs_v1.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (DERBY-378) support for import/export of tables with clob/blob and the other binary data types will be good addition to derby,

2007-04-05 Thread Suresh Thalamati (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Thalamati updated DERBY-378:
---

Attachment: iexlobs_v1.txt

updated the spec and  added some notes for dcoumnentation. 


 support for  import/export  of  tables with clob/blob and the other binary 
 data types   will be good addition to derby,
 ---

 Key: DERBY-378
 URL: https://issues.apache.org/jira/browse/DERBY-378
 Project: Derby
  Issue Type: Improvement
  Components: Tools
Affects Versions: 10.1.1.0
Reporter: Suresh Thalamati
 Assigned To: Suresh Thalamati
 Attachments: derby378_1.diff, derby378_1.stat, derby378_2.diff, 
 derby378_2.stat, derby378_3.diff, derby378_3.stat, derby378_4.diff, 
 derby378_4.stat, derby378_5.diff, derby378_6.diff, iexlobs.txt, iexlobs_v1.txt


 Currently if  I have  a table that contains clob/blob column,  import/export 
 operations on that table
 throghs  unsupported feature exception. 
 set schema iep;
 set schema iep;
 create table ntype(a int , ct CLOB(1024));
 create table ntype1(bt BLOB(1024) , a int);
 call SYSCS_UTIL.SYSCS_EXPORT_TABLE ('iep', 'ntype' , 'extinout/ntype.dat' ,
  null, null, null) ;
 ERROR XIE0B: Column 'CT' in the table is of type CLOB, it is not supported by 
 th
 e import/export feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (DERBY-2527) Add documentation for import/export of LOBS and other binary data types.

2007-04-05 Thread Suresh Thalamati (JIRA)
Add documentation for  import/export  of LOBS and other binary data types. 
---

 Key: DERBY-2527
 URL: https://issues.apache.org/jira/browse/DERBY-2527
 Project: Derby
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 10.3.0.0
Reporter: Suresh Thalamati
 Attachments: iexlobs_v1.txt



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (DERBY-2527) Add documentation for import/export of LOBS and other binary data types.

2007-04-05 Thread Suresh Thalamati (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Thalamati updated DERBY-2527:


Attachment: iexlobs_v1.txt

updated version of spec   and some notes for the documentation. 


 Add documentation for  import/export  of LOBS and other binary data types. 
 ---

 Key: DERBY-2527
 URL: https://issues.apache.org/jira/browse/DERBY-2527
 Project: Derby
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 10.3.0.0
Reporter: Suresh Thalamati
 Attachments: iexlobs_v1.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-2020) Change file option for syncing log file to disk from rws to rwd

2007-04-03 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486434
 ] 

Suresh Thalamati commented on DERBY-2020:
-

Thanks  for working on this issue Olav. your latest patch jvmsyncbug_v2.diff
looks good. couple of minor comments:

1) One thing that puzzled me is why are you creating a new 
file rwtest.tmp ? why can not the test be done on the current log file 
itself,  by opening the log file in rws mode and then closing, before
it is opened in the  appropriate mode. That way you can avoid a creating a
new file and deleting it. 

2) 
And Also, there are these weird read only db state scenarios. For example 
if you attempt to create a file when the db is made read readonly by putting it
in a jar, derby is ok as long as there are no transactions pending. If there 
any pending transaction we may not catch it any more, because jvmsyncError() 
method 
attempting to create a rwtest.tmp file very early, it may fail immediately on
log factory boot and decide all is well to treat the database as READONLY. But
it will be a inconsistent one. 

I think derby tests suites  has a readonly test, but i am not sure it covers 
pending
transaction error case. 



 Change file option for syncing log file to disk from rws to rwd
 ---

 Key: DERBY-2020
 URL: https://issues.apache.org/jira/browse/DERBY-2020
 Project: Derby
  Issue Type: Improvement
  Components: Performance, Store
Affects Versions: 10.3.0.0
Reporter: Olav Sandstaa
 Assigned To: Olav Sandstaa
 Attachments: disk-cache.png, jvmsyncbug.diff, jvmsyncbug.stat, 
 jvmsyncbug_v2.diff, jvmsyncbug_v2.stat, no-disk-cache.png, rwd.diff, rwd.stat


 For writing the transaction log to disk Derby uses a
 RandomAccessFile. If it is supported by the JVM, the log files are
 opened in rws mode making the file system take care of syncing
 writes to disk. rws mode will ensure that both the data and the file
 meta-data is updated for every write to the file. On some operating
 systems (e.g. Solaris) this leads to two write operation to the disk
 for every write issued by Derby. This is limiting the throughput of
 update intensive applications.  If we could change the file mode to
 rwd this could reduce the number of updates to the disk.
 I have run some simple tests where I have changed mode from rws to
 rwd for the Derby log file. When running a small numbers of
 concurrent client threads the throughput is almost doubled and the
 response time is almost halved. I will attach some graphs that show
 this when running a given number of concurrent tpc-b like clients. These
 graphs show the throughput when running with rws and rwd mode when the
 disk's write cache has been enabled and disabled.
 I am creating this Jira to have a place where we can collect
 information about issues both for and against changing the default
 mode for writing to log files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: some comments on collation wiki page

2007-04-03 Thread Suresh Thalamati

Mike Matrigali wrote:


Mike Matrigali wrote:


snip


Ok, didn't realize this broke the model.  As long as the info gets down
to store I don't really care how.  So if you can't get the info from
the template we pass down, then we should just add another array 
argument to createConglomerate and createAndLoadConglomerate which would
make it look like (this was the approach taken to pass down the 
columnOrdering which is basically ascend/descend info for indexes):


long createConglomerate(
String  implementation,
DataValueDescriptor[]   template,
ColumnOrdering[]columnOrder,
CollationIds[]collationIds,
Properties  properties,
int temporaryFlag)
throws StandardException;




I didn't mean to create a new datatype for the collation id's,
I think int or long is fine.

long createConglomerate(
String  implementation,
DataValueDescriptor[]   template,
ColumnOrdering[]columnOrder,
int[]   collationIds,
Properties  properties,
int temporaryFlag)
throws StandardException;



Mike ,

Any particular reason why you don't want add collationIds, to the 
already existing ColumnOrdering information,  instead of

passing it as separate int[] array.


Thanks
-suresh


[jira] Commented: (DERBY-700) Derby does not prevent dual boot of database from different classloaders on Linux

2007-03-28 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484998
 ] 

Suresh Thalamati commented on DERBY-700:


Thanks a lot for summarizing the problems and possible solutions 
for this issue, Mike. I think the timer base solution you mentioned
might work, but I am not comfortable with a timer based solution. As you
mentioned, users might complain about the background writes, and also 
I think  configuring N to the right value to differentiate false 
negatives/positives boots going to be hard. It will depend on the load 
and the machine configuration (no of cpus ) ..etc.

I was trying to find alternative solutions, without much success. Only
solution I could come up with involves using a system property. I 
understand, earlier we discussed using the system properties and it was 
decided as not a such a good idea. But considering there are NO other 
better solutions found for this problem, so far. I was think having one 
property to maintain a JVMID may not be so bad, user just need to give 
security permission to set one property, i.e  if what I 
describe below actually works!

I would really appreciate  any  suggestions/feedback  for this solution . 

My understanding is a solution to this problem need to solve primarily 
following three issues:

1) Maintaining a state that a database is already booted, if the database
   if booted successfully. 
2) Change the state to NOT_BOOTED, if it is not booted any more because of  a
a) Shutdown of the database
b) Class loader that booted the db is garbage collected.
c) JVM exited. 
 
3) synchronization across class loaders. 

Pseudo code below that attempts to solve this problems by making the 
following Assumptions :

 1) It is ok to use ONE  system property  derby.storage.jvmid to identify 
 a jvm instance id. 
 2) It is ok to use interned strings to synchronize across class loader. 
 3) It is ok to call getCanonicalPath(), i think this may require permission 
for user.dir property if it is not already required. Other solution
may be to assign an ID string on create of the DB and user that for 
DB level synchronization. 
 4) It is ok to rely on the class finalizer to cleanup db lock state, 
when the database  is NOT any more because the loader that booted 
the database is garbage  collected. 


/*
Pseudo code to lock the DB to prevent multiple instance of a database running 
concurrently through class loaders in a single instance of jvm or
multiple instance of jvm.   

Note: Following code class is in a separate class just to understand it 
as separate issue , this code should probably go into the 
dataFactory class where current db-locking is done. 
*/
Class DbLock {

private static final String DERYB_JVM_ID  = derby.storage.jvmid;
private String dbCannonicalPath;   // canonical of the db being booted.
private FileLock fileLock  = null;
private boolean dbLocked = false;

DbLock (String dbCannonicalPath) {
this.dbCannonicalPath = dbCannonicalPath;
}

/* 
 * get a unique JVM ID 
 */
private getJvmId () {
// synchronize across class loaders.
synchronize(DERYB_JVM_ID) {

jvmid = System.getProperty(DERYB_JVM_ID);
// if jvm id is not already exist, generate one 
// and save it into the derby.storage.jvmid system
// property.
if (jvmid == null) {
//generate a new UUID based on the time and IP ..etc. 
jvmid = generateJvmId() 
System.setProperty(derby.storage.jvmid);
}
}
}

/*
 *  Lock the db,  so that other class loader or
 *  another jvm won't be able to boot the same database.
 */
public lock_db_onboot(String dbCannonicalPath)  {

 // Get a file Lock on boot() ; // this already works 
 fileLock = getFileLock(dbex.lck);
 if (lock == null) {
 // if we don't get lock means , some other jvm already 
 // booted it throws  ALREADY_BOOTED error.
 throw ALREADY_BOOTED;
 } else {

 // file lock can be acquired even if the database is already 
 // booted by a different class loader. Check if another class 
 // loader has booted  the DB.  This is done by checking the 
 // JVMID written in the dbex.lck  file. If the JVMID is same 
 // as what is stored in the system property,
 // then database is already booted , throw the error. 
 currentJvmId =  getJvmId();
 synchronize(dbCannonicalPath) {
 onDisk_JVM_ID = readIdFromDisk() ; // read ID from the 
dbex.lck file.
 if (OnDisk_JVM_ID == current_jvm_id ) 
 throw (DATABASE IS ALREADY BOOTED);  
 else{
 dbLocked = true

[jira] Commented: (DERBY-700) Derby does not prevent dual boot of database from different classloaders on Linux

2007-03-27 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484550
 ] 

Suresh Thalamati commented on DERBY-700:


While reading comments for this issue yet again, noticed Rick mentioned long
time ago , it might be possible to make Derby jdbc driver hold a state that 
is global to jvm, not specific to a class loader.  Is that how it really works
even if the user loads the driver using class loaders ? 

Basically , is it possible to make  org.apache.derby.jdbc.EmbeddedDriver.java 
statically initialize an JVMID(a UUID) that can be accessed from any class 
loader ?


 Derby does not prevent dual boot of database from different classloaders on 
 Linux
 -

 Key: DERBY-700
 URL: https://issues.apache.org/jira/browse/DERBY-700
 Project: Derby
  Issue Type: Bug
  Components: Store
Affects Versions: 10.1.2.1
 Environment: ava -version
 java version 1.4.2_08
 Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_08-b03)
 Java HotSpot(TM) Client VM (build 1.4.2_08-b03, mixed mode)
Reporter: Kathey Marsden
Priority: Critical
 Attachments: DERBY-700.diff, DERBY-700.stat, 
 DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.diff, 
 DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.stat, DualBootRepro.java, 
 DualBootRepro2.zip, DualBootRepro_mutltithreaded.tar.bz2


 Derby does not prevent dual boot from two different classloaders on Linux.
 To reproduce run the  program DualBootRepro with no derby jars in your 
 classpath. The program assumes derby.jar is in 10.1.2.1/derby.jar, you can 
 change the location by changing the DERBY_LIB_DIR variable.
 On Linux the output is:
 $java -cp . DualBootRepro
 Loading derby from file:10.1.2.1/derby.jar
 10.1.2.1/derby.jar
 Booted database in loader [EMAIL PROTECTED]
 FAIL: Booted database in 2nd loader [EMAIL PROTECTED]
 On Windows I get the expected output.
 $ java -cp . DualBootRepro
 Loading derby from file:10.1.2.1/derby.jar
 10.1.2.1/derby.jar
 Booted database in loader [EMAIL PROTECTED]
 PASS: Expected exception for dualboot:Another instance of Derby may have 
 already booted the database D:\marsden\repro\dualboot\mydb.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-700) Derby does not prevent dual boot of database from different classloaders on Linux

2007-03-27 Thread Suresh Thalamati (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484586
 ] 

Suresh Thalamati commented on DERBY-700:


Thanks  for confirming , Dan.   I was referring to the commnet you noted.  
Looks like there is  NO way to maintain an in-memory state that is  global to 
JVM  across class loader , other than using  a system property.

 Derby does not prevent dual boot of database from different classloaders on 
 Linux
 -

 Key: DERBY-700
 URL: https://issues.apache.org/jira/browse/DERBY-700
 Project: Derby
  Issue Type: Bug
  Components: Store
Affects Versions: 10.1.2.1
 Environment: ava -version
 java version 1.4.2_08
 Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_08-b03)
 Java HotSpot(TM) Client VM (build 1.4.2_08-b03, mixed mode)
Reporter: Kathey Marsden
Priority: Critical
 Attachments: DERBY-700.diff, DERBY-700.stat, 
 DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.diff, 
 DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.stat, DualBootRepro.java, 
 DualBootRepro2.zip, DualBootRepro_mutltithreaded.tar.bz2


 Derby does not prevent dual boot from two different classloaders on Linux.
 To reproduce run the  program DualBootRepro with no derby jars in your 
 classpath. The program assumes derby.jar is in 10.1.2.1/derby.jar, you can 
 change the location by changing the DERBY_LIB_DIR variable.
 On Linux the output is:
 $java -cp . DualBootRepro
 Loading derby from file:10.1.2.1/derby.jar
 10.1.2.1/derby.jar
 Booted database in loader [EMAIL PROTECTED]
 FAIL: Booted database in 2nd loader [EMAIL PROTECTED]
 On Windows I get the expected output.
 $ java -cp . DualBootRepro
 Loading derby from file:10.1.2.1/derby.jar
 10.1.2.1/derby.jar
 Booted database in loader [EMAIL PROTECTED]
 PASS: Expected exception for dualboot:Another instance of Derby may have 
 already booted the database D:\marsden\repro\dualboot\mydb.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   >