[jira] [Commented] (SPARK-6800) Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions gives incorrect results.
[ https://issues.apache.org/jira/browse/SPARK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495945#comment-14495945 ] Liang-Chi Hsieh commented on SPARK-6800: PR: https://github.com/apache/spark/pull/5488 > Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions > gives incorrect results. > -- > > Key: SPARK-6800 > URL: https://issues.apache.org/jira/browse/SPARK-6800 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 8.1, Apache Derby DB, Spark 1.3.0 CDH5.4.0, > Scala 2.10 >Reporter: Micael Capitão > > Having a Derby table with people info (id, name, age) defined like this: > {code} > val jdbcUrl = "jdbc:derby:memory:PeopleDB;create=true" > val conn = DriverManager.getConnection(jdbcUrl) > val stmt = conn.createStatement() > stmt.execute("CREATE TABLE Person (person_id INT NOT NULL GENERATED ALWAYS AS > IDENTITY CONSTRAINT person_pk PRIMARY KEY, name VARCHAR(50), age INT)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Carvalho', 50)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Lurdes Pereira', 23)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Ana Rita Costa', 12)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Pereira', 32)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Miguel Costa', 15)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Anabela Sintra', 13)") > {code} > If I try to read that table from Spark SQL with lower/upper bounds, like this: > {code} > val people = sqlContext.jdbc(url = jdbcUrl, table = "Person", > columnName = "age", lowerBound = 0, upperBound = 40, numPartitions = 10) > people.show() > {code} > I get this result: > {noformat} > PERSON_ID NAME AGE > 3 Ana Rita Costa 12 > 5 Miguel Costa 15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > 1 Armando Carvalho 50 > {noformat} > Which is wrong, considering the defined upper bound has been ignored (I get a > person with age 50!). > Digging the code, I've found that in {{JDBCRelation.columnPartition}} the > WHERE clauses it generates are the following: > {code} > (0) age < 4,0 > (1) age >= 4 AND age < 8,1 > (2) age >= 8 AND age < 12,2 > (3) age >= 12 AND age < 16,3 > (4) age >= 16 AND age < 20,4 > (5) age >= 20 AND age < 24,5 > (6) age >= 24 AND age < 28,6 > (7) age >= 28 AND age < 32,7 > (8) age >= 32 AND age < 36,8 > (9) age >= 36,9 > {code} > The last condition ignores the upper bound and the other ones may result in > repeated rows being read. > Using the JdbcRDD (and converting it to a DataFrame) I would have something > like this: > {code} > val jdbcRdd = new JdbcRDD(sc, () => DriverManager.getConnection(jdbcUrl), > "SELECT * FROM Person WHERE age >= ? and age <= ?", 0, 40, 10, > rs => (rs.getInt(1), rs.getString(2), rs.getInt(3))) > val people = jdbcRdd.toDF("PERSON_ID", "NAME", "AGE") > people.show() > {code} > Resulting in: > {noformat} > PERSON_ID NAMEAGE > 3 Ana Rita Costa 12 > 5 Miguel Costa15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > {noformat} > Which is correct! > Confirming the WHERE clauses generated by the JdbcRDD in the > {{getPartitions}} I've found it generates the following: > {code} > (0) age >= 0 AND age <= 3 > (1) age >= 4 AND age <= 7 > (2) age >= 8 AND age <= 11 > (3) age >= 12 AND age <= 15 > (4) age >= 16 AND age <= 19 > (5) age >= 20 AND age <= 23 > (6) age >= 24 AND age <= 27 > (7) age >= 28 AND age <= 31 > (8) age >= 32 AND age <= 35 > (9) age >= 36 AND age <= 40 > {code} > This is the behaviour I was expecting from the Spark SQL version. Is the > Spark SQL version buggy or is this some weird expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6800) Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions gives incorrect results.
[ https://issues.apache.org/jira/browse/SPARK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495940#comment-14495940 ] Micael Capitão commented on SPARK-6800: --- Could you please point me to that page? I'm bit confused because not considering the upper/lower bounds would cause the fetch of the whole table and cause the issue I've pointed here in which a person with an age out of bound is retrieved. Thanks. > Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions > gives incorrect results. > -- > > Key: SPARK-6800 > URL: https://issues.apache.org/jira/browse/SPARK-6800 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 8.1, Apache Derby DB, Spark 1.3.0 CDH5.4.0, > Scala 2.10 >Reporter: Micael Capitão > > Having a Derby table with people info (id, name, age) defined like this: > {code} > val jdbcUrl = "jdbc:derby:memory:PeopleDB;create=true" > val conn = DriverManager.getConnection(jdbcUrl) > val stmt = conn.createStatement() > stmt.execute("CREATE TABLE Person (person_id INT NOT NULL GENERATED ALWAYS AS > IDENTITY CONSTRAINT person_pk PRIMARY KEY, name VARCHAR(50), age INT)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Carvalho', 50)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Lurdes Pereira', 23)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Ana Rita Costa', 12)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Pereira', 32)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Miguel Costa', 15)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Anabela Sintra', 13)") > {code} > If I try to read that table from Spark SQL with lower/upper bounds, like this: > {code} > val people = sqlContext.jdbc(url = jdbcUrl, table = "Person", > columnName = "age", lowerBound = 0, upperBound = 40, numPartitions = 10) > people.show() > {code} > I get this result: > {noformat} > PERSON_ID NAME AGE > 3 Ana Rita Costa 12 > 5 Miguel Costa 15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > 1 Armando Carvalho 50 > {noformat} > Which is wrong, considering the defined upper bound has been ignored (I get a > person with age 50!). > Digging the code, I've found that in {{JDBCRelation.columnPartition}} the > WHERE clauses it generates are the following: > {code} > (0) age < 4,0 > (1) age >= 4 AND age < 8,1 > (2) age >= 8 AND age < 12,2 > (3) age >= 12 AND age < 16,3 > (4) age >= 16 AND age < 20,4 > (5) age >= 20 AND age < 24,5 > (6) age >= 24 AND age < 28,6 > (7) age >= 28 AND age < 32,7 > (8) age >= 32 AND age < 36,8 > (9) age >= 36,9 > {code} > The last condition ignores the upper bound and the other ones may result in > repeated rows being read. > Using the JdbcRDD (and converting it to a DataFrame) I would have something > like this: > {code} > val jdbcRdd = new JdbcRDD(sc, () => DriverManager.getConnection(jdbcUrl), > "SELECT * FROM Person WHERE age >= ? and age <= ?", 0, 40, 10, > rs => (rs.getInt(1), rs.getString(2), rs.getInt(3))) > val people = jdbcRdd.toDF("PERSON_ID", "NAME", "AGE") > people.show() > {code} > Resulting in: > {noformat} > PERSON_ID NAMEAGE > 3 Ana Rita Costa 12 > 5 Miguel Costa15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > {noformat} > Which is correct! > Confirming the WHERE clauses generated by the JdbcRDD in the > {{getPartitions}} I've found it generates the following: > {code} > (0) age >= 0 AND age <= 3 > (1) age >= 4 AND age <= 7 > (2) age >= 8 AND age <= 11 > (3) age >= 12 AND age <= 15 > (4) age >= 16 AND age <= 19 > (5) age >= 20 AND age <= 23 > (6) age >= 24 AND age <= 27 > (7) age >= 28 AND age <= 31 > (8) age >= 32 AND age <= 35 > (9) age >= 36 AND age <= 40 > {code} > This is the behaviour I was expecting from the Spark SQL version. Is the > Spark SQL version buggy or is this some weird expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6800) Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions gives incorrect results.
[ https://issues.apache.org/jira/browse/SPARK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495928#comment-14495928 ] Liang-Chi Hsieh commented on SPARK-6800: About the upper and lower bounds issue, please refer to the pr page, Michael Armbrust gives the explanation why it is not a bug. Thanks. > Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions > gives incorrect results. > -- > > Key: SPARK-6800 > URL: https://issues.apache.org/jira/browse/SPARK-6800 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 8.1, Apache Derby DB, Spark 1.3.0 CDH5.4.0, > Scala 2.10 >Reporter: Micael Capitão > > Having a Derby table with people info (id, name, age) defined like this: > {code} > val jdbcUrl = "jdbc:derby:memory:PeopleDB;create=true" > val conn = DriverManager.getConnection(jdbcUrl) > val stmt = conn.createStatement() > stmt.execute("CREATE TABLE Person (person_id INT NOT NULL GENERATED ALWAYS AS > IDENTITY CONSTRAINT person_pk PRIMARY KEY, name VARCHAR(50), age INT)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Carvalho', 50)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Lurdes Pereira', 23)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Ana Rita Costa', 12)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Pereira', 32)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Miguel Costa', 15)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Anabela Sintra', 13)") > {code} > If I try to read that table from Spark SQL with lower/upper bounds, like this: > {code} > val people = sqlContext.jdbc(url = jdbcUrl, table = "Person", > columnName = "age", lowerBound = 0, upperBound = 40, numPartitions = 10) > people.show() > {code} > I get this result: > {noformat} > PERSON_ID NAME AGE > 3 Ana Rita Costa 12 > 5 Miguel Costa 15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > 1 Armando Carvalho 50 > {noformat} > Which is wrong, considering the defined upper bound has been ignored (I get a > person with age 50!). > Digging the code, I've found that in {{JDBCRelation.columnPartition}} the > WHERE clauses it generates are the following: > {code} > (0) age < 4,0 > (1) age >= 4 AND age < 8,1 > (2) age >= 8 AND age < 12,2 > (3) age >= 12 AND age < 16,3 > (4) age >= 16 AND age < 20,4 > (5) age >= 20 AND age < 24,5 > (6) age >= 24 AND age < 28,6 > (7) age >= 28 AND age < 32,7 > (8) age >= 32 AND age < 36,8 > (9) age >= 36,9 > {code} > The last condition ignores the upper bound and the other ones may result in > repeated rows being read. > Using the JdbcRDD (and converting it to a DataFrame) I would have something > like this: > {code} > val jdbcRdd = new JdbcRDD(sc, () => DriverManager.getConnection(jdbcUrl), > "SELECT * FROM Person WHERE age >= ? and age <= ?", 0, 40, 10, > rs => (rs.getInt(1), rs.getString(2), rs.getInt(3))) > val people = jdbcRdd.toDF("PERSON_ID", "NAME", "AGE") > people.show() > {code} > Resulting in: > {noformat} > PERSON_ID NAMEAGE > 3 Ana Rita Costa 12 > 5 Miguel Costa15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > {noformat} > Which is correct! > Confirming the WHERE clauses generated by the JdbcRDD in the > {{getPartitions}} I've found it generates the following: > {code} > (0) age >= 0 AND age <= 3 > (1) age >= 4 AND age <= 7 > (2) age >= 8 AND age <= 11 > (3) age >= 12 AND age <= 15 > (4) age >= 16 AND age <= 19 > (5) age >= 20 AND age <= 23 > (6) age >= 24 AND age <= 27 > (7) age >= 28 AND age <= 31 > (8) age >= 32 AND age <= 35 > (9) age >= 36 AND age <= 40 > {code} > This is the behaviour I was expecting from the Spark SQL version. Is the > Spark SQL version buggy or is this some weird expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6800) Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions gives incorrect results.
[ https://issues.apache.org/jira/browse/SPARK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495926#comment-14495926 ] Micael Capitão commented on SPARK-6800: --- My fault. I'm sorry. I have looked to the wrong place to retrieve the where clauses. "age >= 8 AND age < 12,2" is where="age >= 8 AND age < 12" and partition=2 Fixing the lower/upper bounds issue is all is needed. > Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions > gives incorrect results. > -- > > Key: SPARK-6800 > URL: https://issues.apache.org/jira/browse/SPARK-6800 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 8.1, Apache Derby DB, Spark 1.3.0 CDH5.4.0, > Scala 2.10 >Reporter: Micael Capitão > > Having a Derby table with people info (id, name, age) defined like this: > {code} > val jdbcUrl = "jdbc:derby:memory:PeopleDB;create=true" > val conn = DriverManager.getConnection(jdbcUrl) > val stmt = conn.createStatement() > stmt.execute("CREATE TABLE Person (person_id INT NOT NULL GENERATED ALWAYS AS > IDENTITY CONSTRAINT person_pk PRIMARY KEY, name VARCHAR(50), age INT)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Carvalho', 50)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Lurdes Pereira', 23)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Ana Rita Costa', 12)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Pereira', 32)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Miguel Costa', 15)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Anabela Sintra', 13)") > {code} > If I try to read that table from Spark SQL with lower/upper bounds, like this: > {code} > val people = sqlContext.jdbc(url = jdbcUrl, table = "Person", > columnName = "age", lowerBound = 0, upperBound = 40, numPartitions = 10) > people.show() > {code} > I get this result: > {noformat} > PERSON_ID NAME AGE > 3 Ana Rita Costa 12 > 5 Miguel Costa 15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > 1 Armando Carvalho 50 > {noformat} > Which is wrong, considering the defined upper bound has been ignored (I get a > person with age 50!). > Digging the code, I've found that in {{JDBCRelation.columnPartition}} the > WHERE clauses it generates are the following: > {code} > (0) age < 4,0 > (1) age >= 4 AND age < 8,1 > (2) age >= 8 AND age < 12,2 > (3) age >= 12 AND age < 16,3 > (4) age >= 16 AND age < 20,4 > (5) age >= 20 AND age < 24,5 > (6) age >= 24 AND age < 28,6 > (7) age >= 28 AND age < 32,7 > (8) age >= 32 AND age < 36,8 > (9) age >= 36,9 > {code} > The last condition ignores the upper bound and the other ones may result in > repeated rows being read. > Using the JdbcRDD (and converting it to a DataFrame) I would have something > like this: > {code} > val jdbcRdd = new JdbcRDD(sc, () => DriverManager.getConnection(jdbcUrl), > "SELECT * FROM Person WHERE age >= ? and age <= ?", 0, 40, 10, > rs => (rs.getInt(1), rs.getString(2), rs.getInt(3))) > val people = jdbcRdd.toDF("PERSON_ID", "NAME", "AGE") > people.show() > {code} > Resulting in: > {noformat} > PERSON_ID NAMEAGE > 3 Ana Rita Costa 12 > 5 Miguel Costa15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > {noformat} > Which is correct! > Confirming the WHERE clauses generated by the JdbcRDD in the > {{getPartitions}} I've found it generates the following: > {code} > (0) age >= 0 AND age <= 3 > (1) age >= 4 AND age <= 7 > (2) age >= 8 AND age <= 11 > (3) age >= 12 AND age <= 15 > (4) age >= 16 AND age <= 19 > (5) age >= 20 AND age <= 23 > (6) age >= 24 AND age <= 27 > (7) age >= 28 AND age <= 31 > (8) age >= 32 AND age <= 35 > (9) age >= 36 AND age <= 40 > {code} > This is the behaviour I was expecting from the Spark SQL version. Is the > Spark SQL version buggy or is this some weird expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6800) Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions gives incorrect results.
[ https://issues.apache.org/jira/browse/SPARK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495921#comment-14495921 ] Liang-Chi Hsieh commented on SPARK-6800: The ranges for partitions 1 to 8 are not overlapped. They are: (1) 8 > age >= 4 (2) 12 > age >= 8 (3) 16 > age >= 12 (4) 20 > age >= 16 (5) 24 > age >= 20 (6) 28 > age >= 24 (7) 32 > age >= 28 (8) 36 > age >= 32 > Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions > gives incorrect results. > -- > > Key: SPARK-6800 > URL: https://issues.apache.org/jira/browse/SPARK-6800 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 8.1, Apache Derby DB, Spark 1.3.0 CDH5.4.0, > Scala 2.10 >Reporter: Micael Capitão > > Having a Derby table with people info (id, name, age) defined like this: > {code} > val jdbcUrl = "jdbc:derby:memory:PeopleDB;create=true" > val conn = DriverManager.getConnection(jdbcUrl) > val stmt = conn.createStatement() > stmt.execute("CREATE TABLE Person (person_id INT NOT NULL GENERATED ALWAYS AS > IDENTITY CONSTRAINT person_pk PRIMARY KEY, name VARCHAR(50), age INT)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Carvalho', 50)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Lurdes Pereira', 23)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Ana Rita Costa', 12)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Pereira', 32)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Miguel Costa', 15)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Anabela Sintra', 13)") > {code} > If I try to read that table from Spark SQL with lower/upper bounds, like this: > {code} > val people = sqlContext.jdbc(url = jdbcUrl, table = "Person", > columnName = "age", lowerBound = 0, upperBound = 40, numPartitions = 10) > people.show() > {code} > I get this result: > {noformat} > PERSON_ID NAME AGE > 3 Ana Rita Costa 12 > 5 Miguel Costa 15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > 1 Armando Carvalho 50 > {noformat} > Which is wrong, considering the defined upper bound has been ignored (I get a > person with age 50!). > Digging the code, I've found that in {{JDBCRelation.columnPartition}} the > WHERE clauses it generates are the following: > {code} > (0) age < 4,0 > (1) age >= 4 AND age < 8,1 > (2) age >= 8 AND age < 12,2 > (3) age >= 12 AND age < 16,3 > (4) age >= 16 AND age < 20,4 > (5) age >= 20 AND age < 24,5 > (6) age >= 24 AND age < 28,6 > (7) age >= 28 AND age < 32,7 > (8) age >= 32 AND age < 36,8 > (9) age >= 36,9 > {code} > The last condition ignores the upper bound and the other ones may result in > repeated rows being read. > Using the JdbcRDD (and converting it to a DataFrame) I would have something > like this: > {code} > val jdbcRdd = new JdbcRDD(sc, () => DriverManager.getConnection(jdbcUrl), > "SELECT * FROM Person WHERE age >= ? and age <= ?", 0, 40, 10, > rs => (rs.getInt(1), rs.getString(2), rs.getInt(3))) > val people = jdbcRdd.toDF("PERSON_ID", "NAME", "AGE") > people.show() > {code} > Resulting in: > {noformat} > PERSON_ID NAMEAGE > 3 Ana Rita Costa 12 > 5 Miguel Costa15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > {noformat} > Which is correct! > Confirming the WHERE clauses generated by the JdbcRDD in the > {{getPartitions}} I've found it generates the following: > {code} > (0) age >= 0 AND age <= 3 > (1) age >= 4 AND age <= 7 > (2) age >= 8 AND age <= 11 > (3) age >= 12 AND age <= 15 > (4) age >= 16 AND age <= 19 > (5) age >= 20 AND age <= 23 > (6) age >= 24 AND age <= 27 > (7) age >= 28 AND age <= 31 > (8) age >= 32 AND age <= 35 > (9) age >= 36 AND age <= 40 > {code} > This is the behaviour I was expecting from the Spark SQL version. Is the > Spark SQL version buggy or is this some weird expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6800) Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions gives incorrect results.
[ https://issues.apache.org/jira/browse/SPARK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495908#comment-14495908 ] Micael Capitão commented on SPARK-6800: --- In the example I've provided, there will be no repeated rows read because of my dataset. But if the queries generated are as this: (0) age < 4,0 (1) age >= 4 AND age < 8,1 (2) age >= 8 AND age < 12,2 (3) age >= 12 AND age < 16,3 (4) age >= 16 AND age < 20,4 (5) age >= 20 AND age < 24,5 (6) age >= 24 AND age < 28,6 (7) age >= 28 AND age < 32,7 (8) age >= 32 AND age < 36,8 (9) age >= 36,9 ... the ranges for the partitions 1 to 8 overlap. In a real, more complex, scenario that would be a same record in multiple partitions. The title of the issue does not state that issue, but the description states it... > Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions > gives incorrect results. > -- > > Key: SPARK-6800 > URL: https://issues.apache.org/jira/browse/SPARK-6800 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 8.1, Apache Derby DB, Spark 1.3.0 CDH5.4.0, > Scala 2.10 >Reporter: Micael Capitão > > Having a Derby table with people info (id, name, age) defined like this: > {code} > val jdbcUrl = "jdbc:derby:memory:PeopleDB;create=true" > val conn = DriverManager.getConnection(jdbcUrl) > val stmt = conn.createStatement() > stmt.execute("CREATE TABLE Person (person_id INT NOT NULL GENERATED ALWAYS AS > IDENTITY CONSTRAINT person_pk PRIMARY KEY, name VARCHAR(50), age INT)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Carvalho', 50)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Lurdes Pereira', 23)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Ana Rita Costa', 12)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Pereira', 32)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Miguel Costa', 15)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Anabela Sintra', 13)") > {code} > If I try to read that table from Spark SQL with lower/upper bounds, like this: > {code} > val people = sqlContext.jdbc(url = jdbcUrl, table = "Person", > columnName = "age", lowerBound = 0, upperBound = 40, numPartitions = 10) > people.show() > {code} > I get this result: > {noformat} > PERSON_ID NAME AGE > 3 Ana Rita Costa 12 > 5 Miguel Costa 15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > 1 Armando Carvalho 50 > {noformat} > Which is wrong, considering the defined upper bound has been ignored (I get a > person with age 50!). > Digging the code, I've found that in {{JDBCRelation.columnPartition}} the > WHERE clauses it generates are the following: > {code} > (0) age < 4,0 > (1) age >= 4 AND age < 8,1 > (2) age >= 8 AND age < 12,2 > (3) age >= 12 AND age < 16,3 > (4) age >= 16 AND age < 20,4 > (5) age >= 20 AND age < 24,5 > (6) age >= 24 AND age < 28,6 > (7) age >= 28 AND age < 32,7 > (8) age >= 32 AND age < 36,8 > (9) age >= 36,9 > {code} > The last condition ignores the upper bound and the other ones may result in > repeated rows being read. > Using the JdbcRDD (and converting it to a DataFrame) I would have something > like this: > {code} > val jdbcRdd = new JdbcRDD(sc, () => DriverManager.getConnection(jdbcUrl), > "SELECT * FROM Person WHERE age >= ? and age <= ?", 0, 40, 10, > rs => (rs.getInt(1), rs.getString(2), rs.getInt(3))) > val people = jdbcRdd.toDF("PERSON_ID", "NAME", "AGE") > people.show() > {code} > Resulting in: > {noformat} > PERSON_ID NAMEAGE > 3 Ana Rita Costa 12 > 5 Miguel Costa15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > {noformat} > Which is correct! > Confirming the WHERE clauses generated by the JdbcRDD in the > {{getPartitions}} I've found it generates the following: > {code} > (0) age >= 0 AND age <= 3 > (1) age >= 4 AND age <= 7 > (2) age >= 8 AND age <= 11 > (3) age >= 12 AND age <= 15 > (4) age >= 16 AND age <= 19 > (5) age >= 20 AND age <= 23 > (6) age >= 24 AND age <= 27 > (7) age >= 28 AND age <= 31 > (8) age >= 32 AND age <= 35 > (9) age >= 36 AND age <= 40 > {code} > This is the behaviour I was expecting from the Spark SQL version. Is the > Spark SQL version buggy or is this some weird expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6800) Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions gives incorrect results.
[ https://issues.apache.org/jira/browse/SPARK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495570#comment-14495570 ] Liang-Chi Hsieh commented on SPARK-6800: And according to the explanation from Michael Armbrust, {{lowerBound}} and {{upperBound}} are just used to decide partition stride, not for filtering. So all table rows are partitioned. So this is not a bug. But of course the document needs to be updated to clearly state this. > Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions > gives incorrect results. > -- > > Key: SPARK-6800 > URL: https://issues.apache.org/jira/browse/SPARK-6800 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 8.1, Apache Derby DB, Spark 1.3.0 CDH5.4.0, > Scala 2.10 >Reporter: Micael Capitão > > Having a Derby table with people info (id, name, age) defined like this: > {code} > val jdbcUrl = "jdbc:derby:memory:PeopleDB;create=true" > val conn = DriverManager.getConnection(jdbcUrl) > val stmt = conn.createStatement() > stmt.execute("CREATE TABLE Person (person_id INT NOT NULL GENERATED ALWAYS AS > IDENTITY CONSTRAINT person_pk PRIMARY KEY, name VARCHAR(50), age INT)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Carvalho', 50)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Lurdes Pereira', 23)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Ana Rita Costa', 12)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Pereira', 32)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Miguel Costa', 15)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Anabela Sintra', 13)") > {code} > If I try to read that table from Spark SQL with lower/upper bounds, like this: > {code} > val people = sqlContext.jdbc(url = jdbcUrl, table = "Person", > columnName = "age", lowerBound = 0, upperBound = 40, numPartitions = 10) > people.show() > {code} > I get this result: > {noformat} > PERSON_ID NAME AGE > 3 Ana Rita Costa 12 > 5 Miguel Costa 15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > 1 Armando Carvalho 50 > {noformat} > Which is wrong, considering the defined upper bound has been ignored (I get a > person with age 50!). > Digging the code, I've found that in {{JDBCRelation.columnPartition}} the > WHERE clauses it generates are the following: > {code} > (0) age < 4,0 > (1) age >= 4 AND age < 8,1 > (2) age >= 8 AND age < 12,2 > (3) age >= 12 AND age < 16,3 > (4) age >= 16 AND age < 20,4 > (5) age >= 20 AND age < 24,5 > (6) age >= 24 AND age < 28,6 > (7) age >= 28 AND age < 32,7 > (8) age >= 32 AND age < 36,8 > (9) age >= 36,9 > {code} > The last condition ignores the upper bound and the other ones may result in > repeated rows being read. > Using the JdbcRDD (and converting it to a DataFrame) I would have something > like this: > {code} > val jdbcRdd = new JdbcRDD(sc, () => DriverManager.getConnection(jdbcUrl), > "SELECT * FROM Person WHERE age >= ? and age <= ?", 0, 40, 10, > rs => (rs.getInt(1), rs.getString(2), rs.getInt(3))) > val people = jdbcRdd.toDF("PERSON_ID", "NAME", "AGE") > people.show() > {code} > Resulting in: > {noformat} > PERSON_ID NAMEAGE > 3 Ana Rita Costa 12 > 5 Miguel Costa15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > {noformat} > Which is correct! > Confirming the WHERE clauses generated by the JdbcRDD in the > {{getPartitions}} I've found it generates the following: > {code} > (0) age >= 0 AND age <= 3 > (1) age >= 4 AND age <= 7 > (2) age >= 8 AND age <= 11 > (3) age >= 12 AND age <= 15 > (4) age >= 16 AND age <= 19 > (5) age >= 20 AND age <= 23 > (6) age >= 24 AND age <= 27 > (7) age >= 28 AND age <= 31 > (8) age >= 32 AND age <= 35 > (9) age >= 36 AND age <= 40 > {code} > This is the behaviour I was expecting from the Spark SQL version. Is the > Spark SQL version buggy or is this some weird expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6800) Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions gives incorrect results.
[ https://issues.apache.org/jira/browse/SPARK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495564#comment-14495564 ] Liang-Chi Hsieh commented on SPARK-6800: Why the other ones will result in repeated rows being read? > Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions > gives incorrect results. > -- > > Key: SPARK-6800 > URL: https://issues.apache.org/jira/browse/SPARK-6800 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 8.1, Apache Derby DB, Spark 1.3.0 CDH5.4.0, > Scala 2.10 >Reporter: Micael Capitão > > Having a Derby table with people info (id, name, age) defined like this: > {code} > val jdbcUrl = "jdbc:derby:memory:PeopleDB;create=true" > val conn = DriverManager.getConnection(jdbcUrl) > val stmt = conn.createStatement() > stmt.execute("CREATE TABLE Person (person_id INT NOT NULL GENERATED ALWAYS AS > IDENTITY CONSTRAINT person_pk PRIMARY KEY, name VARCHAR(50), age INT)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Carvalho', 50)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Lurdes Pereira', 23)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Ana Rita Costa', 12)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Pereira', 32)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Miguel Costa', 15)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Anabela Sintra', 13)") > {code} > If I try to read that table from Spark SQL with lower/upper bounds, like this: > {code} > val people = sqlContext.jdbc(url = jdbcUrl, table = "Person", > columnName = "age", lowerBound = 0, upperBound = 40, numPartitions = 10) > people.show() > {code} > I get this result: > {noformat} > PERSON_ID NAME AGE > 3 Ana Rita Costa 12 > 5 Miguel Costa 15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > 1 Armando Carvalho 50 > {noformat} > Which is wrong, considering the defined upper bound has been ignored (I get a > person with age 50!). > Digging the code, I've found that in {{JDBCRelation.columnPartition}} the > WHERE clauses it generates are the following: > {code} > (0) age < 4,0 > (1) age >= 4 AND age < 8,1 > (2) age >= 8 AND age < 12,2 > (3) age >= 12 AND age < 16,3 > (4) age >= 16 AND age < 20,4 > (5) age >= 20 AND age < 24,5 > (6) age >= 24 AND age < 28,6 > (7) age >= 28 AND age < 32,7 > (8) age >= 32 AND age < 36,8 > (9) age >= 36,9 > {code} > The last condition ignores the upper bound and the other ones may result in > repeated rows being read. > Using the JdbcRDD (and converting it to a DataFrame) I would have something > like this: > {code} > val jdbcRdd = new JdbcRDD(sc, () => DriverManager.getConnection(jdbcUrl), > "SELECT * FROM Person WHERE age >= ? and age <= ?", 0, 40, 10, > rs => (rs.getInt(1), rs.getString(2), rs.getInt(3))) > val people = jdbcRdd.toDF("PERSON_ID", "NAME", "AGE") > people.show() > {code} > Resulting in: > {noformat} > PERSON_ID NAMEAGE > 3 Ana Rita Costa 12 > 5 Miguel Costa15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > {noformat} > Which is correct! > Confirming the WHERE clauses generated by the JdbcRDD in the > {{getPartitions}} I've found it generates the following: > {code} > (0) age >= 0 AND age <= 3 > (1) age >= 4 AND age <= 7 > (2) age >= 8 AND age <= 11 > (3) age >= 12 AND age <= 15 > (4) age >= 16 AND age <= 19 > (5) age >= 20 AND age <= 23 > (6) age >= 24 AND age <= 27 > (7) age >= 28 AND age <= 31 > (8) age >= 32 AND age <= 35 > (9) age >= 36 AND age <= 40 > {code} > This is the behaviour I was expecting from the Spark SQL version. Is the > Spark SQL version buggy or is this some weird expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6800) Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions gives incorrect results.
[ https://issues.apache.org/jira/browse/SPARK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492364#comment-14492364 ] Micael Capitão commented on SPARK-6800: --- The above pull request seem to only fix the upper and lower bounds issue. There is still the intermediate queries issue that may result in repeated rows being fetched from a DB. > Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions > gives incorrect results. > -- > > Key: SPARK-6800 > URL: https://issues.apache.org/jira/browse/SPARK-6800 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 8.1, Apache Derby DB, Spark 1.3.0 CDH5.4.0, > Scala 2.10 >Reporter: Micael Capitão > > Having a Derby table with people info (id, name, age) defined like this: > {code} > val jdbcUrl = "jdbc:derby:memory:PeopleDB;create=true" > val conn = DriverManager.getConnection(jdbcUrl) > val stmt = conn.createStatement() > stmt.execute("CREATE TABLE Person (person_id INT NOT NULL GENERATED ALWAYS AS > IDENTITY CONSTRAINT person_pk PRIMARY KEY, name VARCHAR(50), age INT)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Carvalho', 50)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Lurdes Pereira', 23)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Ana Rita Costa', 12)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Pereira', 32)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Miguel Costa', 15)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Anabela Sintra', 13)") > {code} > If I try to read that table from Spark SQL with lower/upper bounds, like this: > {code} > val people = sqlContext.jdbc(url = jdbcUrl, table = "Person", > columnName = "age", lowerBound = 0, upperBound = 40, numPartitions = 10) > people.show() > {code} > I get this result: > {noformat} > PERSON_ID NAME AGE > 3 Ana Rita Costa 12 > 5 Miguel Costa 15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > 1 Armando Carvalho 50 > {noformat} > Which is wrong, considering the defined upper bound has been ignored (I get a > person with age 50!). > Digging the code, I've found that in {{JDBCRelation.columnPartition}} the > WHERE clauses it generates are the following: > {code} > (0) age < 4,0 > (1) age >= 4 AND age < 8,1 > (2) age >= 8 AND age < 12,2 > (3) age >= 12 AND age < 16,3 > (4) age >= 16 AND age < 20,4 > (5) age >= 20 AND age < 24,5 > (6) age >= 24 AND age < 28,6 > (7) age >= 28 AND age < 32,7 > (8) age >= 32 AND age < 36,8 > (9) age >= 36,9 > {code} > The last condition ignores the upper bound and the other ones may result in > repeated rows being read. > Using the JdbcRDD (and converting it to a DataFrame) I would have something > like this: > {code} > val jdbcRdd = new JdbcRDD(sc, () => DriverManager.getConnection(jdbcUrl), > "SELECT * FROM Person WHERE age >= ? and age <= ?", 0, 40, 10, > rs => (rs.getInt(1), rs.getString(2), rs.getInt(3))) > val people = jdbcRdd.toDF("PERSON_ID", "NAME", "AGE") > people.show() > {code} > Resulting in: > {noformat} > PERSON_ID NAMEAGE > 3 Ana Rita Costa 12 > 5 Miguel Costa15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > {noformat} > Which is correct! > Confirming the WHERE clauses generated by the JdbcRDD in the > {{getPartitions}} I've found it generates the following: > {code} > (0) age >= 0 AND age <= 3 > (1) age >= 4 AND age <= 7 > (2) age >= 8 AND age <= 11 > (3) age >= 12 AND age <= 15 > (4) age >= 16 AND age <= 19 > (5) age >= 20 AND age <= 23 > (6) age >= 24 AND age <= 27 > (7) age >= 28 AND age <= 31 > (8) age >= 32 AND age <= 35 > (9) age >= 36 AND age <= 40 > {code} > This is the behaviour I was expecting from the Spark SQL version. Is the > Spark SQL version buggy or is this some weird expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6800) Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions gives incorrect results.
[ https://issues.apache.org/jira/browse/SPARK-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492244#comment-14492244 ] Apache Spark commented on SPARK-6800: - User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/5488 > Reading from JDBC with SQLContext, using lower/upper bounds and numPartitions > gives incorrect results. > -- > > Key: SPARK-6800 > URL: https://issues.apache.org/jira/browse/SPARK-6800 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 8.1, Apache Derby DB, Spark 1.3.0 CDH5.4.0, > Scala 2.10 >Reporter: Micael Capitão > > Having a Derby table with people info (id, name, age) defined like this: > {code} > val jdbcUrl = "jdbc:derby:memory:PeopleDB;create=true" > val conn = DriverManager.getConnection(jdbcUrl) > val stmt = conn.createStatement() > stmt.execute("CREATE TABLE Person (person_id INT NOT NULL GENERATED ALWAYS AS > IDENTITY CONSTRAINT person_pk PRIMARY KEY, name VARCHAR(50), age INT)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Carvalho', 50)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Lurdes Pereira', 23)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Ana Rita Costa', 12)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Armando Pereira', 32)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Miguel Costa', 15)") > stmt.execute("INSERT INTO Person(name, age) VALUES('Anabela Sintra', 13)") > {code} > If I try to read that table from Spark SQL with lower/upper bounds, like this: > {code} > val people = sqlContext.jdbc(url = jdbcUrl, table = "Person", > columnName = "age", lowerBound = 0, upperBound = 40, numPartitions = 10) > people.show() > {code} > I get this result: > {noformat} > PERSON_ID NAME AGE > 3 Ana Rita Costa 12 > 5 Miguel Costa 15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > 1 Armando Carvalho 50 > {noformat} > Which is wrong, considering the defined upper bound has been ignored (I get a > person with age 50!). > Digging the code, I've found that in {{JDBCRelation.columnPartition}} the > WHERE clauses it generates are the following: > {code} > (0) age < 4,0 > (1) age >= 4 AND age < 8,1 > (2) age >= 8 AND age < 12,2 > (3) age >= 12 AND age < 16,3 > (4) age >= 16 AND age < 20,4 > (5) age >= 20 AND age < 24,5 > (6) age >= 24 AND age < 28,6 > (7) age >= 28 AND age < 32,7 > (8) age >= 32 AND age < 36,8 > (9) age >= 36,9 > {code} > The last condition ignores the upper bound and the other ones may result in > repeated rows being read. > Using the JdbcRDD (and converting it to a DataFrame) I would have something > like this: > {code} > val jdbcRdd = new JdbcRDD(sc, () => DriverManager.getConnection(jdbcUrl), > "SELECT * FROM Person WHERE age >= ? and age <= ?", 0, 40, 10, > rs => (rs.getInt(1), rs.getString(2), rs.getInt(3))) > val people = jdbcRdd.toDF("PERSON_ID", "NAME", "AGE") > people.show() > {code} > Resulting in: > {noformat} > PERSON_ID NAMEAGE > 3 Ana Rita Costa 12 > 5 Miguel Costa15 > 6 Anabela Sintra 13 > 2 Lurdes Pereira 23 > 4 Armando Pereira 32 > {noformat} > Which is correct! > Confirming the WHERE clauses generated by the JdbcRDD in the > {{getPartitions}} I've found it generates the following: > {code} > (0) age >= 0 AND age <= 3 > (1) age >= 4 AND age <= 7 > (2) age >= 8 AND age <= 11 > (3) age >= 12 AND age <= 15 > (4) age >= 16 AND age <= 19 > (5) age >= 20 AND age <= 23 > (6) age >= 24 AND age <= 27 > (7) age >= 28 AND age <= 31 > (8) age >= 32 AND age <= 35 > (9) age >= 36 AND age <= 40 > {code} > This is the behaviour I was expecting from the Spark SQL version. Is the > Spark SQL version buggy or is this some weird expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org