[jira] [Commented] (SPARK-19692) Comparison on BinaryType has incorrect results

2017-02-22 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878443#comment-15878443
 ] 

Sean Owen commented on SPARK-19692:
---

Bytes are signed in the JVM, and thus in Scala and Java. It's always been this 
way everywhere and isn't specific to Spark. 0x8C is a way of writing -116, not 
a positive value.

> Comparison on BinaryType has incorrect results
> --
>
> Key: SPARK-19692
> URL: https://issues.apache.org/jira/browse/SPARK-19692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Don Smith 
>
> I believe there is an issue with comparisons on binary fields:
> {code}
>   val sc = SparkSession.builder.appName("test").getOrCreate()
>   val schema = StructType(Seq(StructField("ip", BinaryType)))
>   val ips = Seq("1.1.1.1", "2.2.2.2", "200.10.6.7").map(s => 
> InetAddress.getByName(s).getAddress)
>   val df = sc.createDataFrame(
> sc.sparkContext.parallelize(ips, 1).map { ip =>
>   Row(ip)
> }, schema
>   )
>   val query = df
> .where(df("ip") >= InetAddress.getByName("200.10.0.0").getAddress)
> .where(df("ip") <= InetAddress.getByName("200.10.255.255").getAddress)
>   logger.info(query.explain(true))
>   val results = query.collect()
>   results.length mustEqual 1
> {code}
> returns no results.
> i believe the problem is that the comparison is coercing the bytes to signed 
> integers in the call to compareTo here in TypeUtils: 
> {code}
>   def compareBinary(x: Array[Byte], y: Array[Byte]): Int = {
> for (i <- 0 until x.length; if i < y.length) {
>   val res = x(i).compareTo(y(i))
>   if (res != 0) return res
> }
> x.length - y.length
>   }
> {code}
> with some hacky testing i was able to get the desired results with: {code} 
> val res = (x(i).toByte & 0xff) - (y(i).toByte & 0xff) {code}
> thanks!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19692) Comparison on BinaryType has incorrect results

2017-02-22 Thread Don Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878431#comment-15878431
 ] 

Don Smith  commented on SPARK-19692:


an even more trivial example:
{code}
  val sc = SparkSession.builder.appName("test").getOrCreate()
  val schema = StructType(Seq(StructField("byte", BinaryType)))

  val byte = Seq(Array(0x8C.toByte))

  val df = sc.createDataFrame(
sc.sparkContext.parallelize(byte, 1).map { ip =>
  SQLRow(ip)
}, schema
  )

  logger.info(df.show)

  val query = df
.where(df("byte") >= Array(0x00.toByte))
.where(df("byte") <= Array(0xFF.toByte))

  logger.info(query.explain(true))
  val results = query.collect()
  results.length mustEqual 1
{code}

i'm having trouble believing this is the expected behavior, and if it is, is it 
defined somewhere?


> Comparison on BinaryType has incorrect results
> --
>
> Key: SPARK-19692
> URL: https://issues.apache.org/jira/browse/SPARK-19692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Don Smith 
>
> I believe there is an issue with comparisons on binary fields:
> {code}
>   val sc = SparkSession.builder.appName("test").getOrCreate()
>   val schema = StructType(Seq(StructField("ip", BinaryType)))
>   val ips = Seq("1.1.1.1", "2.2.2.2", "200.10.6.7").map(s => 
> InetAddress.getByName(s).getAddress)
>   val df = sc.createDataFrame(
> sc.sparkContext.parallelize(ips, 1).map { ip =>
>   Row(ip)
> }, schema
>   )
>   val query = df
> .where(df("ip") >= InetAddress.getByName("200.10.0.0").getAddress)
> .where(df("ip") <= InetAddress.getByName("200.10.255.255").getAddress)
>   logger.info(query.explain(true))
>   val results = query.collect()
>   results.length mustEqual 1
> {code}
> returns no results.
> i believe the problem is that the comparison is coercing the bytes to signed 
> integers in the call to compareTo here in TypeUtils: 
> {code}
>   def compareBinary(x: Array[Byte], y: Array[Byte]): Int = {
> for (i <- 0 until x.length; if i < y.length) {
>   val res = x(i).compareTo(y(i))
>   if (res != 0) return res
> }
> x.length - y.length
>   }
> {code}
> with some hacky testing i was able to get the desired results with: {code} 
> val res = (x(i).toByte & 0xff) - (y(i).toByte & 0xff) {code}
> thanks!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19692) Comparison on BinaryType has incorrect results

2017-02-22 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878202#comment-15878202
 ] 

Sean Owen commented on SPARK-19692:
---

That doesn't sound like a bug. Bytes are signed in Java. If you want to 
interpret them otherwise you'd need to convert them or provide a different 
comparison.

> Comparison on BinaryType has incorrect results
> --
>
> Key: SPARK-19692
> URL: https://issues.apache.org/jira/browse/SPARK-19692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Don Smith 
>
> I believe there is an issue with comparisons on binary fields:
> {code}
>   val sc = SparkSession.builder.appName("test").getOrCreate()
>   val schema = StructType(Seq(StructField("ip", BinaryType)))
>   val ips = Seq("1.1.1.1", "2.2.2.2", "200.10.6.7").map(s => 
> InetAddress.getByName(s).getAddress)
>   val df = sc.createDataFrame(
> sc.sparkContext.parallelize(ips, 1).map { ip =>
>   Row(ip)
> }, schema
>   )
>   val query = df
> .where(df("ip") >= InetAddress.getByName("200.10.0.0").getAddress)
> .where(df("ip") <= InetAddress.getByName("200.10.255.255").getAddress)
>   logger.info(query.explain(true))
>   val results = query.collect()
>   results.length mustEqual 1
> {code}
> returns no results.
> i believe the problem is that the comparison is coercing the bytes to signed 
> integers in the call to compareTo here in TypeUtils: 
> {code}
>   def compareBinary(x: Array[Byte], y: Array[Byte]): Int = {
> for (i <- 0 until x.length; if i < y.length) {
>   val res = x(i).compareTo(y(i))
>   if (res != 0) return res
> }
> x.length - y.length
>   }
> {code}
> with some hacky testing i was able to get the desired results with: {code} 
> val res = (x(i).toByte & 0xff) - (y(i).toByte & 0xff) {code}
> thanks!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19692) Comparison on BinaryType has incorrect results

2017-02-22 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877869#comment-15877869
 ] 

Takeshi Yamamuro commented on SPARK-19692:
--

ISTM the query is correct and am I missing?

{code}
scala> import java.net.InetAddress
scala> import org.apache.spark.sql.types._
scala> val df = Seq("1.1.1.1", "2.2.2.2", "200.10.6.7").map(d => 
Tuple1(InetAddress.getByName(d).getAddress)).toDF("ip")
df: org.apache.spark.sql.DataFrame = [ip: binary]

scala> df.where($"ip" >= InetAddress.getByName("200.10.0.0").getAddress).show
+-+
|   ip|
+-+
|[01 01 01 01]|
|[02 02 02 02]|
|[C8 0A 06 07]|
+-+

scala> df.where($"ip" <= 
InetAddress.getByName("200.10.255.255").getAddress).show
+---+
| ip|
+---+
+---+
{code}

> Comparison on BinaryType has incorrect results
> --
>
> Key: SPARK-19692
> URL: https://issues.apache.org/jira/browse/SPARK-19692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Don Smith 
>
> I believe there is an issue with comparisons on binary fields:
> {code}
>   val sc = SparkSession.builder.appName("test").getOrCreate()
>   val schema = StructType(Seq(StructField("ip", BinaryType)))
>   val ips = Seq("1.1.1.1", "2.2.2.2", "200.10.6.7").map(s => 
> InetAddress.getByName(s).getAddress)
>   val df = sc.createDataFrame(
> sc.sparkContext.parallelize(ips, 1).map { ip =>
>   Row(ip)
> }, schema
>   )
>   val query = df
> .where(df("ip") >= InetAddress.getByName("200.10.0.0").getAddress)
> .where(df("ip") <= InetAddress.getByName("200.10.255.255").getAddress)
>   logger.info(query.explain(true))
>   val results = query.collect()
>   results.length mustEqual 1
> {code}
> returns no results.
> i believe the problem is that the comparison is coercing the bytes to signed 
> integers in the call to compareTo here in TypeUtils: 
> {code}
>   def compareBinary(x: Array[Byte], y: Array[Byte]): Int = {
> for (i <- 0 until x.length; if i < y.length) {
>   val res = x(i).compareTo(y(i))
>   if (res != 0) return res
> }
> x.length - y.length
>   }
> {code}
> with some hacky testing i was able to get the desired results with: {code} 
> val res = (x(i).toByte & 0xff) - (y(i).toByte & 0xff) {code}
> thanks!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org