from:"Pete Robbins \(JIRA\)"

[jira] [Created] (SPARK-19710) Test Failures in SQLQueryTests on big endian platforms

2017-02-23 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-19710:


 Summary: Test Failures in SQLQueryTests on big endian platforms
 Key: SPARK-19710
 URL: https://issues.apache.org/jira/browse/SPARK-19710
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Pete Robbins
Priority: Minor


Some of the new test queries introduced by 
https://issues.apache.org/jira/browse/SPARK-18871 fail when run on zLinux (big 
endian)

The order of the return rows is different to the results file, hence the 
failures, but the results are valid for the queries as insufficient ordering is 
specified to give absolute results.

The failing tests are in o.a.s.SQLQuerTestSuite
in-joins.sql
not-in-joins.sql
in-set-operations.sql

These can be fixed by adding to the ORDER BY clauses to determine the resulting 
row order.

PR on it's way



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-18963) Test Failuire on big endian; o.a.s.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray

2016-12-21 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-18963:


 Summary: Test Failuire on big endian; 
o.a.s.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray
 Key: SPARK-18963
 URL: https://issues.apache.org/jira/browse/SPARK-18963
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 2.1.1
Reporter: Pete Robbins
Priority: Minor


SPARK-18658 introduced a new test which is flipping a ByteBuffer into little 
endian order. This is not necessary on a big endian platform and results in:

writeToOutputStreamIntArray(org.apache.spark.unsafe.types.UTF8StringSuite)  
Time elapsed: 0.01 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[大千世界]> but was:<[姤�䃍���]>
at 
org.apache.spark.unsafe.types.UTF8StringSuite.writeToOutputStreamIntArray(UTF8StringSuite.java:609)


PR on it's way



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2016-10-13 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571890#comment-15571890
 ] 

Pete Robbins commented on SPARK-17827:
--

I have a PR ready which I will submit as soon as I have run the tests on both 
Big and Little Endian

> StatisticsColumnSuite failures on big endian platforms
> --
>
> Key: SPARK-17827
> URL: https://issues.apache.org/jira/browse/SPARK-17827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: big endian
>Reporter: Pete Robbins
>  Labels: big-endian
>
> https://issues.apache.org/jira/browse/SPARK-17073
> introduces new tests/function that fails on big endian platforms
> Failing tests:
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> string column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> binary column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> columns with different types
>  org.apache.spark.sql.hive.StatisticsSuite.generate column-level statistics 
> and load them from hive metastore
> all fail in checkColStat eg: 
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.StatisticsTest$.checkColStat(StatisticsTest.scala:92)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:43)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:40)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1.apply$mcV$sp(StatisticsTest.scala:40)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.withTable(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsTest$class.checkColStats(StatisticsTest.scala:33)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.checkColStats(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply$mcV$sp(StatisticsColumnSuite.scala:171)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2016-10-13 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571546#comment-15571546
 ] 

Pete Robbins commented on SPARK-17827:
--

right so in these two cases maxLength  in  AnalyzeColumnCommand is returning an 
Int type and I guess in other cases it could be Long??

> StatisticsColumnSuite failures on big endian platforms
> --
>
> Key: SPARK-17827
> URL: https://issues.apache.org/jira/browse/SPARK-17827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: big endian
>Reporter: Pete Robbins
>  Labels: big-endian
>
> https://issues.apache.org/jira/browse/SPARK-17073
> introduces new tests/function that fails on big endian platforms
> Failing tests:
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> string column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> binary column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> columns with different types
>  org.apache.spark.sql.hive.StatisticsSuite.generate column-level statistics 
> and load them from hive metastore
> all fail in checkColStat eg: 
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.StatisticsTest$.checkColStat(StatisticsTest.scala:92)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:43)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:40)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1.apply$mcV$sp(StatisticsTest.scala:40)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.withTable(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsTest$class.checkColStats(StatisticsTest.scala:33)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.checkColStats(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply$mcV$sp(StatisticsColumnSuite.scala:171)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2016-10-13 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571348#comment-15571348
 ] 

Pete Robbins edited comment on SPARK-17827 at 10/13/16 9:11 AM:


In Statistics.scala

{code}
case class StringColumnStat(statRow: InternalRow) {
  println("StringColumnStat: " + statRow)
  // The indices here must be consistent with 
`ColumnStatStruct.stringColumnStat`.
  val numNulls: Long = statRow.getLong(0)
  val avgColLen: Double = statRow.getDouble(1)
  val maxColLen: Long = statRow.getLong(2)   << Actual type in 
statRow is Int
  val ndv: Long = statRow.getLong(3)
}

case class BinaryColumnStat(statRow: InternalRow) {
  // The indices here must be consistent with 
`ColumnStatStruct.binaryColumnStat`.
  val numNulls: Long = statRow.getLong(0)
  val avgColLen: Double = statRow.getDouble(1)
  val maxColLen: Long = statRow.getLong(2)<< Actual type in 
statRow is Int
}

{code}

So either the code above should be using getInt for the maxColLen or the code 
generating the row should be creating a Long


was (Author: robbinspg):
In Statistics.scala

case class StringColumnStat(statRow: InternalRow) {
  println("StringColumnStat: " + statRow)
  // The indices here must be consistent with 
`ColumnStatStruct.stringColumnStat`.
  val numNulls: Long = statRow.getLong(0)
  val avgColLen: Double = statRow.getDouble(1)
  val maxColLen: Long = statRow.getLong(2)   << Actual type in 
statRow is Int
  val ndv: Long = statRow.getLong(3)
}

case class BinaryColumnStat(statRow: InternalRow) {
  // The indices here must be consistent with 
`ColumnStatStruct.binaryColumnStat`.
  val numNulls: Long = statRow.getLong(0)
  val avgColLen: Double = statRow.getDouble(1)
  val maxColLen: Long = statRow.getLong(2)<< Actual type in 
statRow is Int
}

So either the code above should be using getInt for the maxColLen or the code 
generating the row should be creating a Long

> StatisticsColumnSuite failures on big endian platforms
> --
>
> Key: SPARK-17827
> URL: https://issues.apache.org/jira/browse/SPARK-17827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: big endian
>Reporter: Pete Robbins
>  Labels: big-endian
>
> https://issues.apache.org/jira/browse/SPARK-17073
> introduces new tests/function that fails on big endian platforms
> Failing tests:
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> string column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> binary column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> columns with different types
>  org.apache.spark.sql.hive.StatisticsSuite.generate column-level statistics 
> and load them from hive metastore
> all fail in checkColStat eg: 
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.StatisticsTest$.checkColStat(StatisticsTest.scala:92)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:43)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:40)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1.apply$mcV$sp(StatisticsTest.scala:40)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.withTable(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsTest$class.checkColStats(StatisticsTest.scala:33)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.checkColStats(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply$mcV$sp(StatisticsColumnSuite.scala:171)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2016-10-13 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571348#comment-15571348
 ] 

Pete Robbins edited comment on SPARK-17827 at 10/13/16 9:10 AM:


In Statistics.scala

case class StringColumnStat(statRow: InternalRow) {
  println("StringColumnStat: " + statRow)
  // The indices here must be consistent with 
`ColumnStatStruct.stringColumnStat`.
  val numNulls: Long = statRow.getLong(0)
  val avgColLen: Double = statRow.getDouble(1)
  val maxColLen: Long = statRow.getLong(2)   << Actual type in 
statRow is Int
  val ndv: Long = statRow.getLong(3)
}

case class BinaryColumnStat(statRow: InternalRow) {
  // The indices here must be consistent with 
`ColumnStatStruct.binaryColumnStat`.
  val numNulls: Long = statRow.getLong(0)
  val avgColLen: Double = statRow.getDouble(1)
  val maxColLen: Long = statRow.getLong(2)<< Actual type in 
statRow is Int
}

So either the code above should be using getInt for the maxColLen or the code 
generating the row should be creating a Long


was (Author: robbinspg):
In Statistics,scala

case class StringColumnStat(statRow: InternalRow) {
  println("StringColumnStat: " + statRow)
  // The indices here must be consistent with 
`ColumnStatStruct.stringColumnStat`.
  val numNulls: Long = statRow.getLong(0)
  val avgColLen: Double = statRow.getDouble(1)
  val maxColLen: Long = statRow.getLong(2)   << Actual type in 
statRow is Int
  val ndv: Long = statRow.getLong(3)
}

case class BinaryColumnStat(statRow: InternalRow) {
  // The indices here must be consistent with 
`ColumnStatStruct.binaryColumnStat`.
  val numNulls: Long = statRow.getLong(0)
  val avgColLen: Double = statRow.getDouble(1)
  val maxColLen: Long = statRow.getLong(2)<< Actual type in 
statRow is Int
}

So either the code above should be using getInt for the maxColLen or the code 
generating the row should be creating a Long

> StatisticsColumnSuite failures on big endian platforms
> --
>
> Key: SPARK-17827
> URL: https://issues.apache.org/jira/browse/SPARK-17827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: big endian
>Reporter: Pete Robbins
>  Labels: big-endian
>
> https://issues.apache.org/jira/browse/SPARK-17073
> introduces new tests/function that fails on big endian platforms
> Failing tests:
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> string column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> binary column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> columns with different types
>  org.apache.spark.sql.hive.StatisticsSuite.generate column-level statistics 
> and load them from hive metastore
> all fail in checkColStat eg: 
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.StatisticsTest$.checkColStat(StatisticsTest.scala:92)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:43)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:40)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1.apply$mcV$sp(StatisticsTest.scala:40)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.withTable(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsTest$class.checkColStats(StatisticsTest.scala:33)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.checkColStats(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply$mcV$sp(StatisticsColumnSuite.scala:171)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2016-10-13 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571348#comment-15571348
 ] 

Pete Robbins commented on SPARK-17827:
--

In Statistics,scala

case class StringColumnStat(statRow: InternalRow) {
  println("StringColumnStat: " + statRow)
  // The indices here must be consistent with 
`ColumnStatStruct.stringColumnStat`.
  val numNulls: Long = statRow.getLong(0)
  val avgColLen: Double = statRow.getDouble(1)
  val maxColLen: Long = statRow.getLong(2)   << Actual type in 
statRow is Int
  val ndv: Long = statRow.getLong(3)
}

case class BinaryColumnStat(statRow: InternalRow) {
  // The indices here must be consistent with 
`ColumnStatStruct.binaryColumnStat`.
  val numNulls: Long = statRow.getLong(0)
  val avgColLen: Double = statRow.getDouble(1)
  val maxColLen: Long = statRow.getLong(2)<< Actual type in 
statRow is Int
}

So either the code above should be using getInt for the maxColLen or the code 
generating the row should be creating a Long

> StatisticsColumnSuite failures on big endian platforms
> --
>
> Key: SPARK-17827
> URL: https://issues.apache.org/jira/browse/SPARK-17827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: big endian
>Reporter: Pete Robbins
>  Labels: big-endian
>
> https://issues.apache.org/jira/browse/SPARK-17073
> introduces new tests/function that fails on big endian platforms
> Failing tests:
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> string column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> binary column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> columns with different types
>  org.apache.spark.sql.hive.StatisticsSuite.generate column-level statistics 
> and load them from hive metastore
> all fail in checkColStat eg: 
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.StatisticsTest$.checkColStat(StatisticsTest.scala:92)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:43)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:40)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1.apply$mcV$sp(StatisticsTest.scala:40)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.withTable(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsTest$class.checkColStats(StatisticsTest.scala:33)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.checkColStats(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply$mcV$sp(StatisticsColumnSuite.scala:171)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2016-10-12 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569048#comment-15569048
 ] 

Pete Robbins commented on SPARK-17827:
--

So this looks like the max field is being written as an Int into the UnsafeRow 
but is later read as a Long. Code stack to the write:

java.lang.Thread.dumpStack(Thread.java:462)
at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:149)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$1.apply(AggregationIterator.scala:232)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$1.apply(AggregationIterator.scala:221)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.next(TungstenAggregationIterator.scala:392)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.next(TungstenAggregationIterator.scala:79)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator.foreach(AggregationIterator.scala:35)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator.to(AggregationIterator.scala:35)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator.toBuffer(AggregationIterator.scala:35)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator.toArray(AggregationIterator.scala:35)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1927)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1927)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:785)

> StatisticsColumnSuite failures on big endian platforms
> --
>
> Key: SPARK-17827
> URL: https://issues.apache.org/jira/browse/SPARK-17827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: big endian
>Reporter: Pete Robbins
>  Labels: big-endian
>
> https://issues.apache.org/jira/browse/SPARK-17073
> introduces new tests/function that fails on big endian platforms
> Failing tests:
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> string column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> binary column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> columns with different types
>  org.apache.spark.sql.hive.StatisticsSuite.generate column-level statistics 
> and load them from hive metastore
> all fail in checkColStat eg: 
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.StatisticsTest$.checkColStat(StatisticsTest.scala:92)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:43)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:40)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1.apply$mcV$sp(StatisticsTest.scala:40)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.withTable(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsTest$class.checkColStats(StatisticsTest.scala:33)
>   at 
>

[jira] [Commented] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2016-10-10 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561849#comment-15561849
 ] 

Pete Robbins commented on SPARK-17827:
--

[~ZenWzh] Any ideas what code introduced that could cause endian issues? This 
is usually something like writing a field as one type but reading it as another 
eg putLong but then readInt.

> StatisticsColumnSuite failures on big endian platforms
> --
>
> Key: SPARK-17827
> URL: https://issues.apache.org/jira/browse/SPARK-17827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: big endian
>Reporter: Pete Robbins
>  Labels: big-endian
>
> https://issues.apache.org/jira/browse/SPARK-17073
> introduces new tests/function that fails on big endian platforms
> Failing tests:
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> string column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> binary column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> columns with different types
>  org.apache.spark.sql.hive.StatisticsSuite.generate column-level statistics 
> and load them from hive metastore
> all fail in checkColStat eg: 
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.StatisticsTest$.checkColStat(StatisticsTest.scala:92)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:43)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:40)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1.apply$mcV$sp(StatisticsTest.scala:40)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.withTable(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsTest$class.checkColStats(StatisticsTest.scala:33)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.checkColStats(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply$mcV$sp(StatisticsColumnSuite.scala:171)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2016-10-07 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554825#comment-15554825
 ] 

Pete Robbins commented on SPARK-17827:
--

I'm investigating this

> StatisticsColumnSuite failures on big endian platforms
> --
>
> Key: SPARK-17827
> URL: https://issues.apache.org/jira/browse/SPARK-17827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: big endian
>Reporter: Pete Robbins
>  Labels: big-endian
>
> https://issues.apache.org/jira/browse/SPARK-17073
> introduces new tests/function that fails on big endian platforms
> Failing tests:
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> string column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> binary column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> columns with different types
>  org.apache.spark.sql.hive.StatisticsSuite.generate column-level statistics 
> and load them from hive metastore
> all fail in checkColStat eg: 
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.StatisticsTest$.checkColStat(StatisticsTest.scala:92)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:43)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:40)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1.apply$mcV$sp(StatisticsTest.scala:40)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.withTable(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsTest$class.checkColStats(StatisticsTest.scala:33)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.checkColStats(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply$mcV$sp(StatisticsColumnSuite.scala:171)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2016-10-07 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-17827:


 Summary: StatisticsColumnSuite failures on big endian platforms
 Key: SPARK-17827
 URL: https://issues.apache.org/jira/browse/SPARK-17827
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
 Environment: big endian
Reporter: Pete Robbins


https://issues.apache.org/jira/browse/SPARK-17073

introduces new tests/function that fails on big endian platforms

Failing tests:

 org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for string 
column
 org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for binary 
column
 org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for columns 
with different types
 org.apache.spark.sql.hive.StatisticsSuite.generate column-level statistics and 
load them from hive metastore

all fail in checkColStat eg: 
java.lang.AssertionError: assertion failed
  at scala.Predef$.assert(Predef.scala:156)
  at 
org.apache.spark.sql.StatisticsTest$.checkColStat(StatisticsTest.scala:92)
  at 
org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:43)
  at 
org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:40)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at 
org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1.apply$mcV$sp(StatisticsTest.scala:40)
  at 
org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
  at 
org.apache.spark.sql.StatisticsColumnSuite.withTable(StatisticsColumnSuite.scala:30)
  at 
org.apache.spark.sql.StatisticsTest$class.checkColStats(StatisticsTest.scala:33)
  at 
org.apache.spark.sql.StatisticsColumnSuite.checkColStats(StatisticsColumnSuite.scala:30)
  at 
org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply$mcV$sp(StatisticsColumnSuite.scala:171)
  at 
org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)
  at 
org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-16 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins updated SPARK-15822:
-
Component/s: SQL

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-16 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333594#comment-15333594
 ] 

Pete Robbins commented on SPARK-15822:
--

Tracking where the memory is being allocated it is interesting that for the 
Executor that crashes (Executor task launch worker-3) the Unsafe off-heap 
memory is allocated in a much different address range than the other executors. 
I think the generated code is incorrect anyway but may be accidentally passing 
as sometimes the memory in the freed page is still accessible?

{noformat}
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-3 UnsafeMemoryAllocator.allocated: 262144 at *28900976*
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-1 UnsafeMemoryAllocator.allocated: 262144 at 140689774260048
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-4 UnsafeMemoryAllocator.allocated: 262144 at 140689572734864
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-7 UnsafeMemoryAllocator.allocated: 262144 at 140689908250560
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-8 UnsafeMemoryAllocator.allocated: 262144 at 140690243746416
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-0 UnsafeMemoryAllocator.allocated: 262144 at 140690176924448
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-5 UnsafeMemoryAllocator.allocated: 262144 at 140689773997888
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-6 UnsafeMemoryAllocator.allocated: 262144 at 140689707058576
{noformat}

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at

[jira] [Comment Edited] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-16 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333594#comment-15333594
 ] 

Pete Robbins edited comment on SPARK-15822 at 6/16/16 11:23 AM:


Tracking where the memory is being allocated it is interesting that for the 
Executor that crashes (Executor task launch worker-3) the Unsafe off-heap 
memory is allocated in a much different address range than the other executors. 
I think the generated code is incorrect anyway but may be accidentally passing 
as sometimes the memory in the freed page is still accessible?

org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-3 UnsafeMemoryAllocator.allocated: 262144 at *28900976*
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-1 UnsafeMemoryAllocator.allocated: 262144 at 140689774260048
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-4 UnsafeMemoryAllocator.allocated: 262144 at 140689572734864
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-7 UnsafeMemoryAllocator.allocated: 262144 at 140689908250560
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-8 UnsafeMemoryAllocator.allocated: 262144 at 140690243746416
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-0 UnsafeMemoryAllocator.allocated: 262144 at 140690176924448
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-5 UnsafeMemoryAllocator.allocated: 262144 at 140689773997888
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-6 UnsafeMemoryAllocator.allocated: 262144 at 140689707058576



was (Author: robbinspg):
Tracking where the memory is being allocated it is interesting that for the 
Executor that crashes (Executor task launch worker-3) the Unsafe off-heap 
memory is allocated in a much different address range than the other executors. 
I think the generated code is incorrect anyway but may be accidentally passing 
as sometimes the memory in the freed page is still accessible?

{noformat}
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-3 UnsafeMemoryAllocator.allocated: 262144 at *28900976*
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-1 UnsafeMemoryAllocator.allocated: 262144 at 140689774260048
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-4 UnsafeMemoryAllocator.allocated: 262144 at 140689572734864
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-7 UnsafeMemoryAllocator.allocated: 262144 at 140689908250560
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-8 UnsafeMemoryAllocator.allocated: 262144 at 140690243746416
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-0 UnsafeMemoryAllocator.allocated: 262144 at 140690176924448
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-5 UnsafeMemoryAllocator.allocated: 262144 at 140689773997888
org.apache.spark.unsafe.memory.UnsafeMemoryAllocator@7bf8aaef Executor task 
launch worker-6 UnsafeMemoryAllocator.allocated: 262144 at 140689707058576
{noformat}

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we

[jira] [Comment Edited] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-16 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333586#comment-15333586
 ] 

Pete Robbins edited comment on SPARK-15822 at 6/16/16 11:18 AM:


OK so I think I know what is happening! 

In the following generate SMJ code on line 058 the call to leftIter.next() will 
return either a Row constructed by pointing into a page OR for the final Row in 
the iterator it returns a copy of the row and the underlying memory pages are 
freed in cleanupResources. This now means that any Rows previously returned 
from the iterator are invalid as they are addressing freed memory. In the case 
where I see the segv the Row assigned to smj_value6 is pointing into freed 
memory and this causes the  segmentation fault.


{code}
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator smj_leftInput;
/* 008 */   private scala.collection.Iterator smj_rightInput;
/* 009 */   private InternalRow smj_leftRow;
/* 010 */   private InternalRow smj_rightRow;
/* 011 */   private UTF8String smj_value4;
/* 012 */   private UTF8String smj_value5;
/* 013 */   private java.util.ArrayList smj_matches;
/* 014 */   private UTF8String smj_value6;
/* 015 */   private UTF8String smj_value7;
/* 016 */   private UTF8String smj_value8;
/* 017 */   private boolean smj_isNull4;
/* 018 */   private UTF8String smj_value9;
/* 019 */   private boolean smj_isNull5;
/* 020 */   private long smj_value10;
/* 021 */   private org.apache.spark.sql.execution.metric.SQLMetric 
smj_numOutputRows;
/* 022 */   private UnsafeRow smj_result;
/* 023 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder smj_holder;
/* 024 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter smj_rowWriter;
/* 025 */   private UnsafeRow project_result;
/* 026 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder;
/* 027 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
project_rowWriter;
/* 028 */
/* 029 */   public GeneratedIterator(Object[] references) {
/* 030 */ this.references = references;
/* 031 */   }
/* 032 */
/* 033 */   public void init(int index, scala.collection.Iterator inputs[]) {
/* 034 */ partitionIndex = index;
/* 035 */ smj_leftInput = inputs[0];
/* 036 */ smj_rightInput = inputs[1];
/* 037 */
/* 038 */ smj_rightRow = null;
/* 039 */
/* 040 */ smj_matches = new java.util.ArrayList();
/* 041 */
/* 042 */ this.smj_numOutputRows = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[0];
/* 043 */ smj_result = new UnsafeRow(6);
/* 044 */ this.smj_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(smj_result, 128);
/* 045 */ this.smj_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(smj_holder, 
6);
/* 046 */ project_result = new UnsafeRow(3);
/* 047 */ this.project_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 
64);
/* 048 */ this.project_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder,
 3);
/* 049 */   }
/* 050 */
/* 051 */   private boolean findNextInnerJoinRows(
/* 052 */ scala.collection.Iterator leftIter,
/* 053 */ scala.collection.Iterator rightIter) {
/* 054 */ smj_leftRow = null;
/* 055 */ int comp = 0;
/* 056 */ while (smj_leftRow == null) {
/* 057 */   if (!leftIter.hasNext()) return false;
/* 058 */   smj_leftRow = (InternalRow) leftIter.next();
/* 059 */
/* 060 */   boolean smj_isNull = smj_leftRow.isNullAt(0);
/* 061 */   UTF8String smj_value = smj_isNull ? null : 
(smj_leftRow.getUTF8String(0));
/* 062 */
/* 063 */   boolean smj_isNull1 = smj_leftRow.isNullAt(1);
/* 064 */   UTF8String smj_value1 = smj_isNull1 ? null : 
(smj_leftRow.getUTF8String(1));
/* 065 */   if (smj_isNull || smj_isNull1) {
/* 066 */ smj_leftRow = null;
/* 067 */ continue;
/* 068 */   }
/* 069 */   if (!smj_matches.isEmpty()) {
/* 070 */ comp = 0;
/* 071 */ if (comp == 0) {
/* 072 */   comp = smj_value.compare(smj_value6);
/* 073 */ }
/* 074 */ if (comp == 0) {
/* 075 */   comp = smj_value1.compare(smj_value7);
/* 076 */ }
/* 077 */
/* 078 */ if (comp == 0) {
/* 079 */   return true;
/* 080 */ }
/* 081 */ smj_matches.clear();
/* 082 */   }
/* 083 */
/* 084 */   do {
/* 085 */ if (smj_rightRow == null) {
/* 086 */   if (!rightIter.hasNext()) {
/* 087 */ smj_value6 =

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-16 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333586#comment-15333586
 ] 

Pete Robbins commented on SPARK-15822:
--

OK so I think I know what is happening! 

In the following generate SMJ code on line 058 the call to leftIter.next() will 
return either a Row constructed by pointing into a page OR for the final Row in 
the iterator it returns a copy of the row and the memory pages are freed in 
cleanupResources. This now means that any Rows previously returned from the 
iterator are invalid as they are addressing freed memory. In the case where I 
see the segv the Row assigned to smj_value6 is pointing into freed memory and 
this causes the  segmentation fault.


{code}
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator smj_leftInput;
/* 008 */   private scala.collection.Iterator smj_rightInput;
/* 009 */   private InternalRow smj_leftRow;
/* 010 */   private InternalRow smj_rightRow;
/* 011 */   private UTF8String smj_value4;
/* 012 */   private UTF8String smj_value5;
/* 013 */   private java.util.ArrayList smj_matches;
/* 014 */   private UTF8String smj_value6;
/* 015 */   private UTF8String smj_value7;
/* 016 */   private UTF8String smj_value8;
/* 017 */   private boolean smj_isNull4;
/* 018 */   private UTF8String smj_value9;
/* 019 */   private boolean smj_isNull5;
/* 020 */   private long smj_value10;
/* 021 */   private org.apache.spark.sql.execution.metric.SQLMetric 
smj_numOutputRows;
/* 022 */   private UnsafeRow smj_result;
/* 023 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder smj_holder;
/* 024 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter smj_rowWriter;
/* 025 */   private UnsafeRow project_result;
/* 026 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder;
/* 027 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
project_rowWriter;
/* 028 */
/* 029 */   public GeneratedIterator(Object[] references) {
/* 030 */ this.references = references;
/* 031 */   }
/* 032 */
/* 033 */   public void init(int index, scala.collection.Iterator inputs[]) {
/* 034 */ partitionIndex = index;
/* 035 */ smj_leftInput = inputs[0];
/* 036 */ smj_rightInput = inputs[1];
/* 037 */
/* 038 */ smj_rightRow = null;
/* 039 */
/* 040 */ smj_matches = new java.util.ArrayList();
/* 041 */
/* 042 */ this.smj_numOutputRows = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[0];
/* 043 */ smj_result = new UnsafeRow(6);
/* 044 */ this.smj_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(smj_result, 128);
/* 045 */ this.smj_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(smj_holder, 
6);
/* 046 */ project_result = new UnsafeRow(3);
/* 047 */ this.project_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 
64);
/* 048 */ this.project_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder,
 3);
/* 049 */   }
/* 050 */
/* 051 */   private boolean findNextInnerJoinRows(
/* 052 */ scala.collection.Iterator leftIter,
/* 053 */ scala.collection.Iterator rightIter) {
/* 054 */ smj_leftRow = null;
/* 055 */ int comp = 0;
/* 056 */ while (smj_leftRow == null) {
/* 057 */   if (!leftIter.hasNext()) return false;
/* 058 */   smj_leftRow = (InternalRow) leftIter.next();
/* 059 */
/* 060 */   boolean smj_isNull = smj_leftRow.isNullAt(0);
/* 061 */   UTF8String smj_value = smj_isNull ? null : 
(smj_leftRow.getUTF8String(0));
/* 062 */
/* 063 */   boolean smj_isNull1 = smj_leftRow.isNullAt(1);
/* 064 */   UTF8String smj_value1 = smj_isNull1 ? null : 
(smj_leftRow.getUTF8String(1));
/* 065 */   if (smj_isNull || smj_isNull1) {
/* 066 */ smj_leftRow = null;
/* 067 */ continue;
/* 068 */   }
/* 069 */   if (!smj_matches.isEmpty()) {
/* 070 */ comp = 0;
/* 071 */ if (comp == 0) {
/* 072 */   comp = smj_value.compare(smj_value6);
/* 073 */ }
/* 074 */ if (comp == 0) {
/* 075 */   comp = smj_value1.compare(smj_value7);
/* 076 */ }
/* 077 */
/* 078 */ if (comp == 0) {
/* 079 */   return true;
/* 080 */ }
/* 081 */ smj_matches.clear();
/* 082 */   }
/* 083 */
/* 084 */   do {
/* 085 */ if (smj_rightRow == null) {
/* 086 */   if (!rightIter.hasNext()) {
/* 087 */ smj_value6 = smj_value;
/* 088 */
/* 089 */ smj_value7 =

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-16 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333462#comment-15333462
 ] 

Pete Robbins commented on SPARK-15822:
--

Tracing through  off heap memory allocation this looks like the segv is caused 
by the UTf8String base+offset still addressing a page that has recently been 
freed.

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-15 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331580#comment-15331580
 ] 

Pete Robbins commented on SPARK-15822:
--

I can also recreate this issue on Oracle JDK 1.8:

{noformat}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f0c65d06aec, pid=7521, tid=0x7f0b69ffd700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build 1.8.0_92-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# J 7453 C1 org.apache.spark.unsafe.Platform.getByte(Ljava/lang/Object;J)B (9 
bytes) @ 0x7f0c65d06aec [0x7f0c65d06ae0+0xc]
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---  T H R E A D  ---

Current thread (0x7f0bf4008800):  JavaThread "Executor task launch 
worker-3" daemon [_thread_in_Java, id=7662, 
stack(0x7f0b69efd000,0x7f0b69ffe000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 
0x02868e54

Registers:
RAX=0x7f0c461abb38, RBX=0x7f0c461abb38, RCX=0x7f0c213547c8, 
RDX=0x02868e54
RSP=0x7f0b69ffba40, RBP=0x7f0b69ffbae0, RSI=0x, 
RDI=0x0001008254d8
R8 =0x200bd0a6, R9 =0xd9fa2650, R10=0x7f0c79d39020, 
R11=0x7f0c65d06ae0
R12=0x, R13=0x7f0b69ffba88, R14=0x7f0b69ffbaf8, 
R15=0x7f0bf4008800
RIP=0x7f0c65d06aec, EFLAGS=0x00010202, CSGSFS=0x0033, 
ERR=0x0004
  TRAPNO=0x000e

Top of Stack: (sp=0x7f0b69ffba40)
0x7f0b69ffba40:   7f0b684b4a70 
0x7f0b69ffba50:   7f0b69ffbb10 7f0c65e96d4c
0x7f0b69ffba60:   7f0c65008040 d9fa2628
0x7f0b69ffba70:   7f0b69ffbae0 7f0c650079c0
0x7f0b69ffba80:   7f0c650079c0 02868e54
0x7f0b69ffba90:   0030 
0x7f0b69ffbaa0:   7f0b69ffbaa0 7f0c21351403
0x7f0b69ffbab0:   7f0b69ffbaf8 7f0c213547c8
0x7f0b69ffbac0:    7f0c21351428
0x7f0b69ffbad0:   7f0b69ffba88 7f0b69ffbaf0
0x7f0b69ffbae0:   7f0b69ffbb48 7f0c650079c0
0x7f0b69ffbaf0:    d9f57cf0
0x7f0b69ffbb00:   004c 7f0b69ffbb08
0x7f0b69ffbb10:   7f0c21353726 7f0b69ffbb78
0x7f0b69ffbb20:   7f0c213547c8 
0x7f0b69ffbb30:   7f0c213537a0 7f0b69ffbaf0
0x7f0b69ffbb40:   7f0b69ffbb70 7f0b69ffbbc0
0x7f0b69ffbb50:   7f0c65007d00 
0x7f0b69ffbb60:    0003
0x7f0b69ffbb70:   d9f57cf0 d9fa33b0
0x7f0b69ffbb80:   7f0b69ffbb80 7f0c2135385a
0x7f0b69ffbb90:   7f0b69ffbbd8 7f0c213547c8
0x7f0b69ffbba0:    7f0c21353880
0x7f0b69ffbbb0:   7f0b69ffbb70 7f0b69ffbbd0
0x7f0b69ffbbc0:   7f0b69ffbc20 7f0c65007d00
0x7f0b69ffbbd0:   d9f57cf0 d9fa33b0
0x7f0b69ffbbe0:   7f0b69ffbbe0 7f0b684a24e5
0x7f0b69ffbbf0:   7f0b69ffbc88 7f0b684a2950
0x7f0b69ffbc00:    7f0b684a2618
0x7f0b69ffbc10:   7f0b69ffbbd0 7f0b69ffbc78
0x7f0b69ffbc20:   7f0b69ffbcd0 7f0c65007a90
0x7f0b69ffbc30:     

Instructions: (pc=0x7f0c65d06aec)
0x7f0c65d06acc:   0a 80 11 64 01 f8 12 fe 06 90 0c 64 01 f8 12 fe
0x7f0c65d06adc:   06 90 0c 64 89 84 24 00 c0 fe ff 55 48 83 ec 30
0x7f0c65d06aec:   0f be 04 16 c1 e0 18 c1 f8 18 48 83 c4 30 5d 85
0x7f0c65d06afc:   05 ff f5 28 14 c3 90 90 49 8b 87 a8 02 00 00 49 

Register to memory mapping:

RAX={method} {0x7f0c461abb38} 'getByte' '(Ljava/lang/Object;J)B' in 
'org/apache/spark/unsafe/Platform'
RBX={method} {0x7f0c461abb38} 'getByte' '(Ljava/lang/Object;J)B' in 
'org/apache/spark/unsafe/Platform'
RCX=0x7f0c213547c8 is pointing into metadata
RDX=0x02868e54 is an unknown value
RSP=0x7f0b69ffba40 is pointing into the stack for thread: 0x7f0bf4008800
RBP=0x7f0b69ffbae0 is pointing into the stack for thread: 0x7f0bf4008800
RSI=0x is an unknown value
RDI=0x0001008254d8 is pointing into metadata
R8 =0x200bd0a6 is an unknown value
R9 =0xd9fa2650 is an oop
[B 
 - klass: {type array byte}
 - length: 48
R10=0x7f0c79d39020:  in 
/home/robbins/sdks/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so at 
0x7f0c78d7d000
R11=0x7f0c65d06ae0 is at entry_point+0 in (nmethod*)0x7f0c65d06990
R12=0x is an unknown value
R13=0x7f0b69ffba88 is pointing into

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-15 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331375#comment-15331375
 ] 

Pete Robbins commented on SPARK-15822:
--

and the plan:

{noformat}
== Parsed Logical Plan ==
'Project [unresolvedalias('Origin, None), unresolvedalias('UniqueCarrier, 
None), 'round((('count * 100) / 'total), 2) AS rank#173]
+- Project [Origin#16, UniqueCarrier#8, count#134L, total#97L]
   +- Join Inner, ((Origin#16 = Origin#155) && (UniqueCarrier#8 = 
UniqueCarrier#147))
  :- Aggregate [Origin#16, UniqueCarrier#8], [Origin#16, UniqueCarrier#8, 
count(1) AS count#134L]
  :  +- Filter (NOT (Cancelled#21 = 0) && (CancellationCode#22 = A))
  : +- Filter (Dest#17 = ORD)
  :+- 
Relation[Year#0,Month#1,DayofMonth#2,DayOfWeek#3,DepTime#4,CRSDepTime#5,ArrTime#6,CRSArrTime#7,UniqueCarrier#8,FlightNum#9,TailNum#10,ActualElapsedTime#11,CRSElapsedTime#12,AirTime#13,ArrDelay#14,DepDelay#15,Origin#16,Dest#17,Distance#18,TaxiIn#19,TaxiOut#20,Cancelled#21,CancellationCode#22,Diverted#23,...
 5 more fields] csv
  +- Project [Origin#155, UniqueCarrier#147, count#92L AS total#97L]
 +- Aggregate [Origin#155, UniqueCarrier#147], [Origin#155, 
UniqueCarrier#147, count(1) AS count#92L]
+- Filter (Dest#156 = ORD)
   +- 
Relation[Year#139,Month#140,DayofMonth#141,DayOfWeek#142,DepTime#143,CRSDepTime#144,ArrTime#145,CRSArrTime#146,UniqueCarrier#147,FlightNum#148,TailNum#149,ActualElapsedTime#150,CRSElapsedTime#151,AirTime#152,ArrDelay#153,DepDelay#154,Origin#155,Dest#156,Distance#157,TaxiIn#158,TaxiOut#159,Cancelled#160,CancellationCode#161,Diverted#162,...
 5 more fields] csv

== Analyzed Logical Plan ==
Origin: string, UniqueCarrier: string, rank: double
Project [Origin#16, UniqueCarrier#8, round((cast((count#134L * cast(100 as 
bigint)) as double) / cast(total#97L as double)), 2) AS rank#173]
+- Project [Origin#16, UniqueCarrier#8, count#134L, total#97L]
   +- Join Inner, ((Origin#16 = Origin#155) && (UniqueCarrier#8 = 
UniqueCarrier#147))
  :- Aggregate [Origin#16, UniqueCarrier#8], [Origin#16, UniqueCarrier#8, 
count(1) AS count#134L]
  :  +- Filter (NOT (Cancelled#21 = 0) && (CancellationCode#22 = A))
  : +- Filter (Dest#17 = ORD)
  :+- 
Relation[Year#0,Month#1,DayofMonth#2,DayOfWeek#3,DepTime#4,CRSDepTime#5,ArrTime#6,CRSArrTime#7,UniqueCarrier#8,FlightNum#9,TailNum#10,ActualElapsedTime#11,CRSElapsedTime#12,AirTime#13,ArrDelay#14,DepDelay#15,Origin#16,Dest#17,Distance#18,TaxiIn#19,TaxiOut#20,Cancelled#21,CancellationCode#22,Diverted#23,...
 5 more fields] csv
  +- Project [Origin#155, UniqueCarrier#147, count#92L AS total#97L]
 +- Aggregate [Origin#155, UniqueCarrier#147], [Origin#155, 
UniqueCarrier#147, count(1) AS count#92L]
+- Filter (Dest#156 = ORD)
   +- 
Relation[Year#139,Month#140,DayofMonth#141,DayOfWeek#142,DepTime#143,CRSDepTime#144,ArrTime#145,CRSArrTime#146,UniqueCarrier#147,FlightNum#148,TailNum#149,ActualElapsedTime#150,CRSElapsedTime#151,AirTime#152,ArrDelay#153,DepDelay#154,Origin#155,Dest#156,Distance#157,TaxiIn#158,TaxiOut#159,Cancelled#160,CancellationCode#161,Diverted#162,...
 5 more fields] csv

== Optimized Logical Plan ==
Project [Origin#16, UniqueCarrier#8, round((cast((count#134L * 100) as double) 
/ cast(total#97L as double)), 2) AS rank#173]
+- Join Inner, ((Origin#16 = Origin#155) && (UniqueCarrier#8 = 
UniqueCarrier#147))
   :- Aggregate [Origin#16, UniqueCarrier#8], [Origin#16, UniqueCarrier#8, 
count(1) AS count#134L]
   :  +- Project [UniqueCarrier#8, Origin#16]
   : +- Filter (((isnotnull(Origin#16) && isnotnull(UniqueCarrier#8)) 
&& isnotnull(Cancelled#21)) && isnotnull(CancellationCode#22)) && NOT 
(Cancelled#21 = 0)) && (CancellationCode#22 = A)) && isnotnull(Dest#17)) && 
(Dest#17 = ORD))
   :+- 
Relation[Year#0,Month#1,DayofMonth#2,DayOfWeek#3,DepTime#4,CRSDepTime#5,ArrTime#6,CRSArrTime#7,UniqueCarrier#8,FlightNum#9,TailNum#10,ActualElapsedTime#11,CRSElapsedTime#12,AirTime#13,ArrDelay#14,DepDelay#15,Origin#16,Dest#17,Distance#18,TaxiIn#19,TaxiOut#20,Cancelled#21,CancellationCode#22,Diverted#23,...
 5 more fields] csv
   +- Aggregate [Origin#155, UniqueCarrier#147], [Origin#155, 
UniqueCarrier#147, count(1) AS total#97L]
  +- Project [UniqueCarrier#147, Origin#155]
 +- Filter (((isnotnull(UniqueCarrier#147) && isnotnull(Origin#155)) && 
isnotnull(Dest#156)) && (Dest#156 = ORD))
+- 
Relation[Year#139,Month#140,DayofMonth#141,DayOfWeek#142,DepTime#143,CRSDepTime#144,ArrTime#145,CRSArrTime#146,UniqueCarrier#147,FlightNum#148,TailNum#149,ActualElapsedTime#150,CRSElapsedTime#151,AirTime#152,ArrDelay#153,DepDelay#154,Origin#155,Dest#156,Distance#157,TaxiIn#158,TaxiOut#159,Cancelled#160,CancellationCode#161,Diverted#162,...
 5 more fields] csv

== Physical Plan ==
*Project [Origin#16, UniqueCarrier#8, round((cast((count#134L * 100) as

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-15 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331370#comment-15331370
 ] 

Pete Robbins commented on SPARK-15822:
--

Chatting with [~hvanhovell] here is the current state. I can reproduce a segv 
using local[8] on an 8 core machine. It is intermittent but  many many runs 
with eg local[2] produce no issues. The segv info is:

{noformat}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7fe8c118ca58, pid=3558, tid=140633451779840
#
# JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# J 7467 C1 org.apache.spark.unsafe.Platform.getByte(Ljava/lang/Object;J)B (9 
bytes) @ 0x7fe8c118ca58 [0x7fe8c118ca20+0x38]
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---  T H R E A D  ---

Current thread (0x7fe858018800):  JavaThread "Executor task launch 
worker-3" daemon [_thread_in_Java, id=3698, 
stack(0x7fe7c6dfd000,0x7fe7c6efe000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 
0x00a09cf4

Registers:
RAX=0x7fe884ce5828, RBX=0x7fe884ce5828, RCX=0x7fe81e0a5360, 
RDX=0x00a09cf4
RSP=0x7fe7c6efb9e0, RBP=0x7fe7c6efba80, RSI=0x, 
RDI=0x3848
R8 =0x200b94c8, R9 =0xeef66bf0, R10=0x7fe8d87a2f00, 
R11=0x7fe8c118ca20
R12=0x, R13=0x7fe7c6efba28, R14=0x7fe7c6efba98, 
R15=0x7fe858018800
RIP=0x7fe8c118ca58, EFLAGS=0x00010206, CSGSFS=0x0033, 
ERR=0x0004
  TRAPNO=0x000e

Top of Stack: (sp=0x7fe7c6efb9e0)
0x7fe7c6efb9e0:   7fe7c56941e8 
0x7fe7c6efb9f0:   7fe7c6efbab0 7fe8c140c38c
0x7fe7c6efba00:   7fe8c1007d80 eef66bc8
0x7fe7c6efba10:   7fe7c6efba80 7fe8c1007700
0x7fe7c6efba20:   7fe8c1007700 00a09cf4
0x7fe7c6efba30:   0030 
0x7fe7c6efba40:   7fe7c6efba40 7fe81e0a1f9b
0x7fe7c6efba50:   7fe7c6efba98 7fe81e0a5360
0x7fe7c6efba60:    7fe81e0a1fc0
0x7fe7c6efba70:   7fe7c6efba28 7fe7c6efba90
0x7fe7c6efba80:   7fe7c6efbae8 7fe8c1007700
0x7fe7c6efba90:    ee4f4898
0x7fe7c6efbaa0:   004d 7fe7c6efbaa8
0x7fe7c6efbab0:   7fe81e0a42be 7fe7c6efbb18
0x7fe7c6efbac0:   7fe81e0a5360 
0x7fe7c6efbad0:   7fe81e0a4338 7fe7c6efba90
0x7fe7c6efbae0:   7fe7c6efbb10 7fe7c6efbb60
0x7fe7c6efbaf0:   7fe8c1007a40 
0x7fe7c6efbb00:    0003
0x7fe7c6efbb10:   ee4f4898 eef67950
0x7fe7c6efbb20:   7fe7c6efbb20 7fe81e0a43f2
0x7fe7c6efbb30:   7fe7c6efbb78 7fe81e0a5360
0x7fe7c6efbb40:    7fe81e0a4418
0x7fe7c6efbb50:   7fe7c6efbb10 7fe7c6efbb70
0x7fe7c6efbb60:   7fe7c6efbbc0 7fe8c1007a40
0x7fe7c6efbb70:   ee4f4898 eef67950
0x7fe7c6efbb80:   7fe7c6efbb80 7fe7c56844e5
0x7fe7c6efbb90:   7fe7c6efbc28 7fe7c5684950
0x7fe7c6efbba0:    7fe7c5684618
0x7fe7c6efbbb0:   7fe7c6efbb70 7fe7c6efbc18
0x7fe7c6efbbc0:   7fe7c6efbc70 7fe8c10077d0
0x7fe7c6efbbd0:     

Instructions: (pc=0x7fe8c118ca58)
0x7fe8c118ca38:   08 83 c7 08 89 78 08 48 b8 28 58 ce 84 e8 7f 00
0x7fe8c118ca48:   00 81 e7 f8 3f 00 00 83 ff 00 0f 84 16 00 00 00
0x7fe8c118ca58:   0f be 04 16 c1 e0 18 c1 f8 18 48 83 c4 30 5d 85
0x7fe8c118ca68:   05 93 c6 85 17 c3 48 89 44 24 08 48 c7 04 24 ff 

Register to memory mapping:

RAX={method} {0x7fe884ce5828} 'getByte' '(Ljava/lang/Object;J)B' in 
'org/apache/spark/unsafe/Platform'
RBX={method} {0x7fe884ce5828} 'getByte' '(Ljava/lang/Object;J)B' in 
'org/apache/spark/unsafe/Platform'
RCX=0x7fe81e0a5360 is pointing into metadata
RDX=0x00a09cf4 is an unknown value
RSP=0x7fe7c6efb9e0 is pointing into the stack for thread: 0x7fe858018800
RBP=0x7fe7c6efba80 is pointing into the stack for thread: 0x7fe858018800
RSI=0x is an unknown value
RDI=0x3848 is an unknown value
R8 =0x200b94c8 is an unknown value
R9 =0xeef66bf0 is an oop
[B 
 - klass: {type array byte}
 - length: 48
R10=0x7fe8d87a2f00:  in 
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-0.b14.el6_7.x86_64/jre/lib/amd64/server/libjvm.so
 at

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-15 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331371#comment-15331371
 ] 

Pete Robbins commented on SPARK-15822:
--

The generated code is:

{code}
Top Arrival Carrier Cancellations:
Found 5 WholeStageCodegen subtrees.
== Subtree 1 / 5 ==
*HashAggregate(key=[Origin#16,UniqueCarrier#8], functions=[partial_count(1)], 
output=[Origin#16,UniqueCarrier#8,count#296L])
+- *Project [UniqueCarrier#8, Origin#16]
   +- *Filter (((isnotnull(Origin#16) && isnotnull(UniqueCarrier#8)) && 
isnotnull(Cancelled#21)) && isnotnull(CancellationCode#22)) && NOT 
(Cancelled#21 = 0)) && (CancellationCode#22 = A)) && isnotnull(Dest#17)) && 
(Dest#17 = ORD))
  +- *Scan csv 
[UniqueCarrier#8,Origin#16,Dest#17,Cancelled#21,CancellationCode#22] Format: 
CSV, InputPaths: file:/home/robbins/brandberry/2008.csv, PushedFilters: 
[IsNotNull(Origin), IsNotNull(UniqueCarrier), IsNotNull(Cancelled), 
IsNotNull(CancellationCode), ..., ReadSchema: 
struct

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private boolean agg_initAgg;
/* 008 */   private boolean agg_bufIsNull;
/* 009 */   private long agg_bufValue;
/* 010 */   private agg_VectorizedHashMap agg_vectorizedHashMap;
/* 011 */   private 
java.util.Iterator 
agg_vectorizedHashMapIter;
/* 012 */   private org.apache.spark.sql.execution.aggregate.HashAggregateExec 
agg_plan;
/* 013 */   private 
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap agg_hashMap;
/* 014 */   private org.apache.spark.sql.execution.UnsafeKVExternalSorter 
agg_sorter;
/* 015 */   private org.apache.spark.unsafe.KVIterator agg_mapIter;
/* 016 */   private org.apache.spark.sql.execution.metric.SQLMetric 
agg_peakMemory;
/* 017 */   private org.apache.spark.sql.execution.metric.SQLMetric 
agg_spillSize;
/* 018 */   private org.apache.spark.sql.execution.metric.SQLMetric 
scan_numOutputRows;
/* 019 */   private scala.collection.Iterator scan_input;
/* 020 */   private org.apache.spark.sql.execution.metric.SQLMetric 
filter_numOutputRows;
/* 021 */   private UnsafeRow filter_result;
/* 022 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder filter_holder;
/* 023 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
filter_rowWriter;
/* 024 */   private UnsafeRow project_result;
/* 025 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder;
/* 026 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
project_rowWriter;
/* 027 */   private UnsafeRow agg_result2;
/* 028 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder;
/* 029 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter;
/* 030 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowJoiner 
agg_unsafeRowJoiner;
/* 031 */   private org.apache.spark.sql.execution.metric.SQLMetric 
wholestagecodegen_numOutputRows;
/* 032 */   private org.apache.spark.sql.execution.metric.SQLMetric 
wholestagecodegen_aggTime;
/* 033 */   private UnsafeRow wholestagecodegen_result;
/* 034 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
wholestagecodegen_holder;
/* 035 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
wholestagecodegen_rowWriter;
/* 036 */
/* 037 */   public GeneratedIterator(Object[] references) {
/* 038 */ this.references = references;
/* 039 */   }
/* 040 */
/* 041 */   public void init(int index, scala.collection.Iterator inputs[]) {
/* 042 */ partitionIndex = index;
/* 043 */ agg_initAgg = false;
/* 044 */
/* 045 */ agg_vectorizedHashMap = new agg_VectorizedHashMap();
/* 046 */
/* 047 */ this.agg_plan = 
(org.apache.spark.sql.execution.aggregate.HashAggregateExec) references[0];
/* 048 */
/* 049 */ this.agg_peakMemory = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[1];
/* 050 */ this.agg_spillSize = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[2];
/* 051 */ this.scan_numOutputRows = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[3];
/* 052 */ scan_input = inputs[0];
/* 053 */ this.filter_numOutputRows = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[4];
/* 054 */ filter_result = new UnsafeRow(5);
/* 055 */ this.filter_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(filter_result, 
128);
/* 056 */ this.filter_rowWriter = new

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-14 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329474#comment-15329474
 ] 

Pete Robbins commented on SPARK-15822:
--

modified app to remove .cache()'s and still get a segv on open jdk 8. 

I may have been mistaken about it failing with 'spark.sql.codegen.wholeStage 
false' as I can not reproduce it with that set.

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-13 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327992#comment-15327992
 ] 

Pete Robbins commented on SPARK-15822:
--

So this does seem to cause the NPE or SEGV intermittently, ie I get some clean 
runs. However, I added some tracing to detect when the UnsafeRow looks corrupt 
(baseobject = null, offset=massive) and I see these in every run so I suspect 
there is always corruption but that doesn't always lead to a visible failure. 
The app usually gives the appearance of success as Spark re-submits the lost 
tasks and restarts failing executors. Here is what I think is the plan 
associated with one of the failing jobs:

== Parsed Logical Plan ==
'Project [unresolvedalias('Origin, None), unresolvedalias('UniqueCarrier, 
None), 'round((('count * 100) / 'total), 2) AS rank#927]
+- Project [Origin#16, UniqueCarrier#8, count#888L, total#851L]
   +- Join Inner, ((Origin#16 = Origin#909) && (UniqueCarrier#8 = 
UniqueCarrier#901))
  :- Aggregate [Origin#16, UniqueCarrier#8], [Origin#16, UniqueCarrier#8, 
count(1) AS count#888L]
  :  +- Filter (NOT (Cancelled#21 = 0) && (CancellationCode#22 = A))
  : +- Filter (Dest#17 = ORD)
  :+- 
Relation[Year#0,Month#1,DayofMonth#2,DayOfWeek#3,DepTime#4,CRSDepTime#5,ArrTime#6,CRSArrTime#7,UniqueCarrier#8,FlightNum#9,TailNum#10,ActualElapsedTime#11,CRSElapsedTime#12,AirTime#13,ArrDelay#14,DepDelay#15,Origin#16,Dest#17,Distance#18,TaxiIn#19,TaxiOut#20,Cancelled#21,CancellationCode#22,Diverted#23,CarrierDelay#24,WeatherDelay#25,NASDelay#26,SecurityDelay#27,LateAircraftDelay#28]
 csv
  +- Project [Origin#909, UniqueCarrier#901, count#846L AS total#851L]
 +- Aggregate [Origin#909, UniqueCarrier#901], [Origin#909, 
UniqueCarrier#901, count(1) AS count#846L]
+- Filter (Dest#910 = ORD)
   +- 
Relation[Year#893,Month#894,DayofMonth#895,DayOfWeek#896,DepTime#897,CRSDepTime#898,ArrTime#899,CRSArrTime#900,UniqueCarrier#901,FlightNum#902,TailNum#903,ActualElapsedTime#904,CRSElapsedTime#905,AirTime#906,ArrDelay#907,DepDelay#908,Origin#909,Dest#910,Distance#911,TaxiIn#912,TaxiOut#913,Cancelled#914,CancellationCode#915,Diverted#916,CarrierDelay#917,WeatherDelay#918,NASDelay#919,SecurityDelay#920,LateAircraftDelay#921]
 csv

== Analyzed Logical Plan ==
Origin: string, UniqueCarrier: string, rank: double
Project [Origin#16, UniqueCarrier#8, round((cast((count#888L * cast(100 as 
bigint)) as double) / cast(total#851L as double)), 2) AS rank#927]
+- Project [Origin#16, UniqueCarrier#8, count#888L, total#851L]
   +- Join Inner, ((Origin#16 = Origin#909) && (UniqueCarrier#8 = 
UniqueCarrier#901))
  :- Aggregate [Origin#16, UniqueCarrier#8], [Origin#16, UniqueCarrier#8, 
count(1) AS count#888L]
  :  +- Filter (NOT (Cancelled#21 = 0) && (CancellationCode#22 = A))
  : +- Filter (Dest#17 = ORD)
  :+- 
Relation[Year#0,Month#1,DayofMonth#2,DayOfWeek#3,DepTime#4,CRSDepTime#5,ArrTime#6,CRSArrTime#7,UniqueCarrier#8,FlightNum#9,TailNum#10,ActualElapsedTime#11,CRSElapsedTime#12,AirTime#13,ArrDelay#14,DepDelay#15,Origin#16,Dest#17,Distance#18,TaxiIn#19,TaxiOut#20,Cancelled#21,CancellationCode#22,Diverted#23,CarrierDelay#24,WeatherDelay#25,NASDelay#26,SecurityDelay#27,LateAircraftDelay#28]
 csv
  +- Project [Origin#909, UniqueCarrier#901, count#846L AS total#851L]
 +- Aggregate [Origin#909, UniqueCarrier#901], [Origin#909, 
UniqueCarrier#901, count(1) AS count#846L]
+- Filter (Dest#910 = ORD)
   +- 
Relation[Year#893,Month#894,DayofMonth#895,DayOfWeek#896,DepTime#897,CRSDepTime#898,ArrTime#899,CRSArrTime#900,UniqueCarrier#901,FlightNum#902,TailNum#903,ActualElapsedTime#904,CRSElapsedTime#905,AirTime#906,ArrDelay#907,DepDelay#908,Origin#909,Dest#910,Distance#911,TaxiIn#912,TaxiOut#913,Cancelled#914,CancellationCode#915,Diverted#916,CarrierDelay#917,WeatherDelay#918,NASDelay#919,SecurityDelay#920,LateAircraftDelay#921]
 csv

== Optimized Logical Plan ==
Project [Origin#16, UniqueCarrier#8, round((cast((count#888L * 100) as double) 
/ cast(total#851L as double)), 2) AS rank#927]
+- Join Inner, ((Origin#16 = Origin#909) && (UniqueCarrier#8 = 
UniqueCarrier#901))
   :- Aggregate [Origin#16, UniqueCarrier#8], [Origin#16, UniqueCarrier#8, 
count(1) AS count#888L]
   :  +- Project [UniqueCarrier#8, Origin#16]
   : +- Filter (isnotnull(UniqueCarrier#8) && isnotnull(Origin#16)) && 
isnotnull(Cancelled#21)) && isnotnull(CancellationCode#22)) && NOT 
(Cancelled#21 = 0)) && (CancellationCode#22 = A))
   :+- InMemoryRelation [Year#0, Month#1, DayofMonth#2, DayOfWeek#3, 
DepTime#4, CRSDepTime#5, ArrTime#6, CRSArrTime#7, UniqueCarrier#8, FlightNum#9, 
TailNum#10, ActualElapsedTime#11, CRSElapsedTime#12, AirTime#13, ArrDelay#14, 
DepDelay#15, Origin#16, Dest#17, Distance#18, TaxiIn#19, TaxiOut#20, 
Cancelled#21, CancellationCode#22, Diverted#23, CarrierDelay#24,

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-11 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325761#comment-15325761
 ] 

Pete Robbins commented on SPARK-15822:
--

has failed on latest Branch 2.0 and master. Currently using Branch-2.0

commit a790ac5793e1988895341fa878f947b09b275926
Author: yinxusen 
Date:   Wed Jun 8 09:18:04 2016 +0100



> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-11 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325759#comment-15325759
 ] 

Pete Robbins commented on SPARK-15822:
--

The stack trace is taken earlier when I detect that the UTF8String created from 
the corrupt UnsafeRow is created as I'm trying to backtrack to the point of 
corruption. The earlier stacktrace is the npe which occurs later on trying to 
use the corrupt UTF8String.

Dumb question but how do I post the plan?

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-10 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325159#comment-15325159
 ] 

Pete Robbins commented on SPARK-15822:
--

I am forcing a system dump when I detect a corrupt UTF8String is being created. 
This is using IBM JVM because I can analyse the dump and see the stacks and 
object contents using Eclipse Memory Analyzer.

So... with whole stage codegen enabled we get a stack of:
java.lang.Thread @ 0x835f9838  
|- at com.ibm.jvm.Dump.SystemDumpImpl()I (Native Method)

  
|- at com.ibm.jvm.Dump.SystemDump()V (Dump.java:139)

  
|- at org.apache.spark.unsafe.types.UTF8String.(Ljava/lang/Object;JI)V 
(UTF8String.java:125(Compiled Code))


|- at 
org.apache.spark.unsafe.types.UTF8String.fromAddress(Ljava/lang/Object;JI)Lorg/apache/spark/unsafe/types/UTF8String;
 (UTF8String.java:102(Compiled Code))   

|- at 
org.apache.spark.sql.catalyst.expressions.UnsafeRow.getUTF8String(I)Lorg/apache/spark/unsafe/types/UTF8String;
 (UnsafeRow.java:414(Compiled Code))
  
|- at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIterator;)V
 (null)  
|- at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext()V
 (null) 
 
|- at org.apache.spark.sql.execution.BufferedRowIterator.hasNext()Z 
(BufferedRowIterator.java:43(Compiled Code))


|- at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext()Z
 (WholeStageCodegenExec.scala:361(Compiled Code))   

|- at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIterator;Lscala/collection/Iterator;Lscala/collection/Iterator;)Z
 (null)
|- at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext()V
 (null) 
  
|- at org.apache.spark.sql.execution.BufferedRowIterator.hasNext()Z 
(BufferedRowIterator.java:43(Compiled Code))

 
|- at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext()Z
 (WholeStageCodegenExec.scala:377)  

|- at scala.collection.Iterator$$anon$11.hasNext()Z 
(Iterator.scala:408(Compiled Code)) 


|- at scala.collection.convert.Wrappers$IteratorWrapper.hasNext()Z 
(Wrappers.scala:30) 

  
|- at 
org.spark_project.guava.collect.Ordering.leastOf(Ljava/util/Iterator;I)Ljava/util/List;
 (Ordering.java:628)
 
|- at 
org.apache.spark.util.collection.Utils$.takeOrdered(Lscala/collection/Iterator;ILscala/math/Ordering;)Lscala/collection/Iterator;
 (Utils.scala:37)   

|- at 
org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(Lscala/collection/Iterator;)Lscala/collection/Iterator;
 (RDD.scala:1365)   

|- at

[jira] [Comment Edited] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-10 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325093#comment-15325093
 ] 

Pete Robbins edited comment on SPARK-15822 at 6/10/16 7:17 PM:
---

How do I disable whole-stage codegen?

found it 

spark.sql.codegen.wholeStage false


was (Author: robbinspg):
How do I disable whole-stage codegen?

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-10 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325093#comment-15325093
 ] 

Pete Robbins commented on SPARK-15822:
--

How do I disable whole-stage codegen?

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-10 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324506#comment-15324506
 ] 

Pete Robbins commented on SPARK-15822:
--

generated SMJ code from the stack:

{code}
public Object generate(Object[] references) {
return new GeneratedIterator(references);
}

/*wholestagecodegen_c1*/
final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
private Object[] references;
private scala.collection.Iterator smj_leftInput;
private scala.collection.Iterator smj_rightInput;
private InternalRow smj_leftRow;
private InternalRow smj_rightRow;
private UTF8String smj_value4;
private UTF8String smj_value5;
private java.util.ArrayList smj_matches;
private UTF8String smj_value6;
private UTF8String smj_value7;
private UTF8String smj_value8;
private boolean smj_isNull4;
private UTF8String smj_value9;
private boolean smj_isNull5;
private long smj_value10;
private org.apache.spark.sql.execution.metric.SQLMetric smj_numOutputRows;
private UnsafeRow smj_result;
private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
smj_holder;
private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
smj_rowWriter;
private UnsafeRow project_result;
private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
project_holder;
private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
project_rowWriter;

public GeneratedIterator(Object[] references) {
this.references = references;
}

public void init(int index, scala.collection.Iterator inputs[]) {
partitionIndex = index;
smj_leftInput = inputs[0];
smj_rightInput = inputs[1];

smj_rightRow = null;

smj_matches = new java.util.ArrayList();

this.smj_numOutputRows = (org.apache.spark.sql.execution.metric.SQLMetric) 
references[0];
smj_result = new UnsafeRow(6);
this.smj_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(smj_result, 128);
this.smj_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(smj_holder, 
6);
project_result = new UnsafeRow(3);
this.project_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 
64);
this.project_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder,
 3);
}

private boolean findNextInnerJoinRows(
scala.collection.Iterator leftIter,
scala.collection.Iterator rightIter) {
smj_leftRow = null;
int comp = 0;
while (smj_leftRow == null) {
if (!leftIter.hasNext()) return false;
smj_leftRow = (InternalRow) leftIter.next();
/*smj_c1*/
boolean smj_isNull = smj_leftRow.isNullAt(0);
UTF8String smj_value = smj_isNull ? null : (smj_leftRow.getUTF8String(0));
/*smj_c2*/
boolean smj_isNull1 = smj_leftRow.isNullAt(1);
UTF8String smj_value1 = smj_isNull1 ? null : (smj_leftRow.getUTF8String(1));
if (smj_isNull || smj_isNull1) {
smj_leftRow = null;
continue;
}
if (!smj_matches.isEmpty()) {
comp = 0;
if (comp == 0) {
comp = smj_value.compare(smj_value6);
}
if (comp == 0) {
comp = smj_value1.compare(smj_value7);
}

if (comp == 0) {
return true;
}
smj_matches.clear();
}

do {
if (smj_rightRow == null) {
if (!rightIter.hasNext()) {
smj_value6 = smj_value;

smj_value7 = smj_value1;

return !smj_matches.isEmpty();
}
smj_rightRow = (InternalRow) rightIter.next();
/*smj_c3*/
boolean smj_isNull2 = smj_rightRow.isNullAt(0);
UTF8String smj_value2 = smj_isNull2 ? null : (smj_rightRow.getUTF8String(0));
/*smj_c4*/
boolean smj_isNull3 = smj_rightRow.isNullAt(1);
UTF8String smj_value3 = smj_isNull3 ? null : (smj_rightRow.getUTF8String(1));
if (smj_isNull2 || smj_isNull3) {
smj_rightRow = null;
continue;
}

smj_value4 = smj_value2;

smj_value5 = smj_value3;

}

comp = 0;
if (comp == 0) {
comp = smj_value.compare(smj_value4);
}
if (comp == 0) {
comp = smj_value1.compare(smj_value5);
}

if (comp > 0) {
smj_rightRow = null;
} else if (comp < 0) {
if (!smj_matches.isEmpty()) {
smj_value6 = smj_value;

smj_value7 = smj_value1;

return true;
}
smj_leftRow = null;
} else {
smj_matches.add(smj_rightRow.copy());
smj_rightRow = null;;
}
} while (smj_leftRow != null);
}
return false; // unreachable
}

protected void processNext() throws java.io.IOException {
/*project_c*/
/*smj_c*/
while (findNextInnerJoinRows(smj_leftInput, smj_rightInput)) {
int smj_size = smj_matches.size();
smj_isNull4 = smj_leftRow.isNullAt(0);
smj_value8 = smj_isNull4 ? null : (smj_leftRow.getUTF8String(0));
smj_isNull5 = smj_leftRow.isNullAt(1);
smj_value9 = smj_isNull5 ? null : (smj_leftRow.getUTF8String(1));
smj_value10 = smj_leftRow.getLong(2);
for (int smj_i = 0; smj_i < smj_size; smj_i ++) {
InternalRow smj_rightRow1 = (InternalRow) smj_matches.get(smj_i);

smj_numOutputRows.add(1);

/*project_c1*/
/*wholestagecodegen_c*/
/*project_c7*/
/*smj_c7*/
long smj_value13 = smj_rightRow1.getLong(2);
boolean project_isNull8 = false;
double project_value8 = -1.0;
if (!false) {
project_value8 =

[jira] [Updated] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-10 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins updated SPARK-15822:
-
Description: 
Executors fail with segmentation violation while running application with
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 512m

Also now reproduced with 
spark.memory.offHeap.enabled false

{noformat}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
#
# JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# J 4816 C2 
org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
 (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
{noformat}
We initially saw this on IBM java on PowerPC box but is recreatable on linux 
with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
same code point:
{noformat}
16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
java.lang.NullPointerException
at 
org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
at 
org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
at 
org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:785)
{noformat}

  was:
Executors fail with segmentation violation while running application with
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 512m
{noformat}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
#
# JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# J 4816 C2 
org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
 (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
{noformat}
We initially saw this on IBM java on PowerPC box but is recreatable on linux 
with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
same code point:
{noformat}
16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
java.lang.NullPointerException
at 
org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at

[jira] [Updated] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-06-10 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins updated SPARK-15822:
-
Summary: segmentation violation in o.a.s.unsafe.types.UTF8String   (was: 
segmentation violation in o.a.s.unsafe.types.UTF8String with 
spark.memory.offHeap.enabled=true)

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String with spark.memory.offHeap.enabled=true

2016-06-10 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324331#comment-15324331
 ] 

Pete Robbins commented on SPARK-15822:
--

I'm still looking into this tracing back through the code using Memory Analyzer 
on the core dumps.

Currently on the stack we have the following generated code

{code}
public Object generate(Object[] references) {
return new GeneratedIterator(references);
}

final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
private Object[] references;
private boolean sort_needToSort;
private org.apache.spark.sql.execution.SortExec sort_plan;
private org.apache.spark.sql.execution.UnsafeExternalRowSorter sort_sorter;
private org.apache.spark.executor.TaskMetrics sort_metrics;
private scala.collection.Iterator sort_sortedIter;
private boolean agg_initAgg;
private boolean agg_bufIsNull;
private long agg_bufValue;
private org.apache.spark.sql.execution.aggregate.HashAggregateExec agg_plan;
private org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap 
agg_hashMap;
private org.apache.spark.sql.execution.UnsafeKVExternalSorter agg_sorter;
private org.apache.spark.unsafe.KVIterator agg_mapIter;
private org.apache.spark.sql.execution.metric.SQLMetric agg_peakMemory;
private org.apache.spark.sql.execution.metric.SQLMetric agg_spillSize;
private scala.collection.Iterator inputadapter_input;
private UnsafeRow agg_result;
private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
agg_holder;
private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
agg_rowWriter;
private UnsafeRow agg_result1;
private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
agg_holder1;
private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
agg_rowWriter1;
private org.apache.spark.sql.execution.metric.SQLMetric sort_numOutputRows;
private org.apache.spark.sql.execution.metric.SQLMetric sort_aggTime;
private org.apache.spark.sql.execution.metric.SQLMetric sort_peakMemory;
private org.apache.spark.sql.execution.metric.SQLMetric sort_spillSize;
private org.apache.spark.sql.execution.metric.SQLMetric sort_sortTime;

public GeneratedIterator(Object[] references) {
this.references = references;
}

public void init(int index, scala.collection.Iterator inputs[]) {
partitionIndex = index;
sort_needToSort = true;
this.sort_plan = (org.apache.spark.sql.execution.SortExec) references[0];
sort_sorter = sort_plan.createSorter();
sort_metrics = org.apache.spark.TaskContext.get().taskMetrics();

agg_initAgg = false;

this.agg_plan = (org.apache.spark.sql.execution.aggregate.HashAggregateExec) 
references[1];

this.agg_peakMemory = (org.apache.spark.sql.execution.metric.SQLMetric) 
references[2];
this.agg_spillSize = (org.apache.spark.sql.execution.metric.SQLMetric) 
references[3];
inputadapter_input = inputs[0];
agg_result = new UnsafeRow(2);
this.agg_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_result, 64);
this.agg_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_holder, 
2);
agg_result1 = new UnsafeRow(3);
this.agg_holder1 = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_result1, 64);
this.agg_rowWriter1 = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_holder1, 
3);
this.sort_numOutputRows = (org.apache.spark.sql.execution.metric.SQLMetric) 
references[4];
this.sort_aggTime = (org.apache.spark.sql.execution.metric.SQLMetric) 
references[5];
this.sort_peakMemory = (org.apache.spark.sql.execution.metric.SQLMetric) 
references[6];
this.sort_spillSize = (org.apache.spark.sql.execution.metric.SQLMetric) 
references[7];
this.sort_sortTime = (org.apache.spark.sql.execution.metric.SQLMetric) 
references[8];
}

private void agg_doAggregateWithKeys() throws java.io.IOException {
agg_hashMap = agg_plan.createHashMap();

while (inputadapter_input.hasNext()) {
InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
UTF8String inputadapter_value = inputadapter_isNull ? null : 
(inputadapter_row.getUTF8String(0));
boolean inputadapter_isNull1 = inputadapter_row.isNullAt(1);
UTF8String inputadapter_value1 = inputadapter_isNull1 ? null : 
(inputadapter_row.getUTF8String(1));
long inputadapter_value2 = inputadapter_row.getLong(2);

UnsafeRow agg_unsafeRowAggBuffer = null;
org.apache.spark.sql.execution.vectorized.ColumnarBatch.Row 
agg_vectorizedAggBuffer = null;

if (agg_vectorizedAggBuffer == null) {
// generate grouping key
agg_holder.reset();

agg_rowWriter.zeroOutNullBytes();

if (inputadapter_isNull) {
agg_rowWriter.setNullAt(0);
} else {
agg_rowWriter.write(0, inputadapter_value);
}

if (inputadapter_isNull1) {
agg_rowWriter.setNullAt(1);
} else {
agg_rowWriter.write(1, inputadapter_value1);
}

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String with spark.memory.offHeap.enabled=true

2016-06-08 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320380#comment-15320380
 ] 

Pete Robbins commented on SPARK-15822:
--

I'm investigating this and will attach the app and config later

> segmentation violation in o.a.s.unsafe.types.UTF8String with 
> spark.memory.offHeap.enabled=true
> --
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Priority: Critical
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String with spark.memory.offHeap.enabled=true

2016-06-08 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-15822:


 Summary: segmentation violation in o.a.s.unsafe.types.UTF8String 
with spark.memory.offHeap.enabled=true
 Key: SPARK-15822
 URL: https://issues.apache.org/jira/browse/SPARK-15822
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: linux amd64

openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

Reporter: Pete Robbins
Priority: Critical


Executors fail with segmentation violation while running application with
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 512m

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
#
# JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# J 4816 C2 
org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
 (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]

We initially saw this on IBM java on PowerPC box but is recreatable on linux 
with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
same code point:

16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
java.lang.NullPointerException
at 
org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
at 
org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
at 
org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:785)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15065) HiveSparkSubmitSuite's "set spark.sql.warehouse.dir" is flaky

2016-06-07 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318492#comment-15318492
 ] 

Pete Robbins commented on SPARK-15065:
--

I think this may be related to 
https://issues.apache.org/jira/browse/SPARK-15606 where there is a deadlock in 
executor shutdown. This test was consistently failing on our machine with only 
2 cores but since my fix to SPARK-15606 it has passed all the time.

> HiveSparkSubmitSuite's "set spark.sql.warehouse.dir" is flaky
> -
>
> Key: SPARK-15065
> URL: https://issues.apache.org/jira/browse/SPARK-15065
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Reporter: Yin Huai
>Priority: Critical
> Attachments: log.txt
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/861/testReport/junit/org.apache.spark.sql.hive/HiveSparkSubmitSuite/dir/
> There are several WARN messages like {{16/05/02 00:51:06 WARN Master: Got 
> status update for unknown executor app-20160502005054-/3}}, which are 
> suspicious. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15606) Driver hang in o.a.s.DistributedSuite on 2 core machine

2016-05-27 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-15606:


 Summary: Driver hang in o.a.s.DistributedSuite on 2 core machine
 Key: SPARK-15606
 URL: https://issues.apache.org/jira/browse/SPARK-15606
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.0
 Environment: AMD64 box with only 2 cores
Reporter: Pete Robbins


repeatedly failing task that crashes JVM *** FAILED ***
  The code passed to failAfter did not complete within 10 milliseconds. 
(DistributedSuite.scala:128)

This test started failing and DistrbutedSuite hanging following 
https://github.com/apache/spark/pull/13055

It looks like the extra message to remove the BlockManager deadlocks as there 
are only 2 message processing loop threads. Related to 
https://issues.apache.org/jira/browse/SPARK-13906

{code}
  /** Thread pool used for dispatching messages. */
  private val threadpool: ThreadPoolExecutor = {
val numThreads = 
nettyEnv.conf.getInt("spark.rpc.netty.dispatcher.numThreads",
  math.max(2, Runtime.getRuntime.availableProcessors()))
val pool = ThreadUtils.newDaemonFixedThreadPool(numThreads, 
"dispatcher-event-loop")
for (i <- 0 until numThreads) {
  pool.execute(new MessageLoop)
}
pool
  }

{code} 

Setting a minimum of 3 threads alleviates this issue but I'm not sure there 
isn't another underlying problem.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15154) LongHashedRelation test fails on Big Endian platform

2016-05-09 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins updated SPARK-15154:
-
Priority: Minor  (was: Major)
 Summary: LongHashedRelation test fails on Big Endian platform  (was: 
LongHashedRelation fails on Big Endian platform)

> LongHashedRelation test fails on Big Endian platform
> 
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>Priority: Minor
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
>   at 
> org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
>   at org.scalatest.Suite$class.run(Suite.scala:1421)
>   at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
>   at

[jira] [Commented] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-09 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276543#comment-15276543
 ] 

Pete Robbins commented on SPARK-15154:
--

I'm convinced the test is invalid. The creation of LongHashedRelation is 
guarded by

{code}
   if (key.length == 1 && key.head.dataType == LongType) {
  LongHashedRelation(input, key, sizeEstimate, mm)
}
{code}

In this failing test the key dataType is IntegerType

I'll submit a PR to fix the tests

> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
>   at 
> org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
>   at

[jira] [Comment Edited] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273885#comment-15273885
 ] 

Pete Robbins edited comment on SPARK-15154 at 5/6/16 11:20 AM:
---

[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian because the 
following 4 bytes happen to be 0, whereas on Big Endian getInt returns "1" but 
get Long will return "268435456"

{code}
val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Ints not 
Longs
map.append(key, unsafeRow)
  }
}
{code}



was (Author: robbinspg):
[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian because the 
following 4 bytes happen to be 0, whereas on Big Endian getInt returns "1" but 
get Long will return "268435456"

{code}
val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Intsnot 
Longs
map.append(key, unsafeRow)
  }
}
{code}


> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at

[jira] [Comment Edited] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273885#comment-15273885
 ] 

Pete Robbins edited comment on SPARK-15154 at 5/6/16 11:14 AM:
---

[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian because the 
following 4 bytes happen to be 0, whereas on Big Endian getInt returns "1" but 
get Long will return "268435456"

{code}
val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Intsnot 
Longs
map.append(key, unsafeRow)
  }
}
{code}



was (Author: robbinspg):
[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian whereas on 
Big Endian getInt returns "1" but get Long will return "268435456"

{code}
val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Intsnot 
Longs
map.append(key, unsafeRow)
  }
}
{code}


> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at

[jira] [Comment Edited] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273897#comment-15273897
 ] 

Pete Robbins edited comment on SPARK-15154 at 5/6/16 11:13 AM:
---

Is this just a testcase issue where in HashedRelationSuite


{code}
val key = Seq(BoundReference(0, IntegerType, false))
{code}

should be

{code}
val key = Seq(BoundReference(0, LongType, false))
{code}


Ans: No, still fails with that change.


was (Author: robbinspg):
Is this just a testcase issue where in HashedRelationSuite


{code}
val key = Seq(BoundReference(0, IntegerType, false))
{code}

should be

{code}
val key = Seq(BoundReference(0, LongType, false))
{code}

> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at

[jira] [Commented] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273897#comment-15273897
 ] 

Pete Robbins commented on SPARK-15154:
--

Is this just a testcase issue where in HashedRelationSuite


{code}
val key = Seq(BoundReference(0, IntegerType, false))
{code}

should be

{code}
val key = Seq(BoundReference(0, LongType, false))
{code}

> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
>   at 
> org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
>   at org.scalatest.Suite$class.run(Suite.scala:1421)
>   at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
>

[jira] [Comment Edited] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273885#comment-15273885
 ] 

Pete Robbins edited comment on SPARK-15154 at 5/6/16 10:28 AM:
---

[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian whereas on 
Big Endian getInt returns "1" but get Long will return "268435456"

{code}
val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Intsnot 
Longs
map.append(key, unsafeRow)
  }
}
{code}



was (Author: robbinspg):
[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian whereas on 
Big Endian getInt returns "1" but get Long will return "268435456"

{quote}
val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Int not Long

map.append(key, unsafeRow)
  }
}
{quote}


> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
>

[jira] [Comment Edited] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273885#comment-15273885
 ] 

Pete Robbins edited comment on SPARK-15154 at 5/6/16 10:27 AM:
---

[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian whereas on 
Big Endian getInt returns "1" but get Long will return "268435456"

{quote}
val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Int not Long

map.append(key, unsafeRow)
  }
}
{quote}



was (Author: robbinspg):
[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian whereas on 
Big Endian getInt returns "1" but get Long will return "268435456"


```
val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Int not Long

map.append(key, unsafeRow)
  }
}
```


> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
>

[jira] [Comment Edited] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273885#comment-15273885
 ] 

Pete Robbins edited comment on SPARK-15154 at 5/6/16 10:24 AM:
---

[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian whereas on 
Big Endian getInt returns "1" but get Long will return "268435456"



val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Int not Long

map.append(key, unsafeRow)
  }
}




was (Author: robbinspg):
[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian whereas on 
Big Endian getInt returns "1" but get Long will return "268435456"



val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Int not Long
map.append(key, unsafeRow)
  }
}



> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
>

[jira] [Comment Edited] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273885#comment-15273885
 ] 

Pete Robbins edited comment on SPARK-15154 at 5/6/16 10:25 AM:
---

[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian whereas on 
Big Endian getInt returns "1" but get Long will return "268435456"


```
val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Int not Long

map.append(key, unsafeRow)
  }
}
```



was (Author: robbinspg):
[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian whereas on 
Big Endian getInt returns "1" but get Long will return "268435456"



val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Int not Long

map.append(key, unsafeRow)
  }
}



> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
>

[jira] [Commented] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273885#comment-15273885
 ] 

Pete Robbins commented on SPARK-15154:
--

[~davies] as you are the author of this code can you comment on my findings?

So the issue here is that the keyGenerator returns an UnsafeRow containing Int 
values but the code below from LongHashedRelation.apply retrieves the key from 
this as a Long. The bytes in the row are

on Little Endian: 01 00 00 00 00 00 00 00 
on Big Endian:   00 00 00 01 00 00 00 00

By chance getInt and getLong  will both return "1" on Little Endian whereas on 
Big Endian getInt returns "1" but get Long will return "268435456"



val keyGenerator = UnsafeProjection.create(key)

// Create a mapping of key -> rows
var numFields = 0
while (input.hasNext) {
  val unsafeRow = input.next().asInstanceOf[UnsafeRow]
  numFields = unsafeRow.numFields()
  val rowKey = keyGenerator(unsafeRow)
  if (!rowKey.isNullAt(0)) {
val key = rowKey.getLong(0) // <<< Values in rowKey are Int not Long
map.append(key, unsafeRow)
  }
}



> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at

[jira] [Updated] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-05 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins updated SPARK-15154:
-
Summary: LongHashedRelation fails on Big Endian platform  (was: 
HashedRelation fails on Big Endian platform)

> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
>   at 
> org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
>   at org.scalatest.Suite$class.run(Suite.scala:1421)
>   at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
>   at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
>   at 
>

[jira] [Updated] (SPARK-15154) HashedRelation fails on Big Endian platform

2016-05-05 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins updated SPARK-15154:
-
Labels: big-endian  (was: )

> HashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
>   at 
> org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
>   at org.scalatest.Suite$class.run(Suite.scala:1421)
>   at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
>   at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
>   at 
> org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
>   at 
>

[jira] [Created] (SPARK-15154) HashedRelation fails on Big Endian platform

2016-05-05 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-15154:


 Summary: HashedRelation fails on Big Endian platform
 Key: SPARK-15154
 URL: https://issues.apache.org/jira/browse/SPARK-15154
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Pete Robbins


NPE in 
org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap

Error Message

java.lang.NullPointerException was thrown.

Stacktrace

  java.lang.NullPointerException
  at 
org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
  at 
org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
  at 
org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
  at 
org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
  at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)
  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
  at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
  at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
  at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
  at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
  at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
  at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
  at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
  at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
  at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
  at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
  at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
  at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
  at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
  at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
  at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
  at org.scalatest.Suite$class.run(Suite.scala:1424)
  at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
  at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
  at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
  at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
  at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
  at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
  at 
org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
  at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
  at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
  at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
  at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
  at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
  at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
  at 
org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
  at org.scalatest.Suite$class.run(Suite.scala:1421)
  at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
  at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
  at 
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
  at 
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
  at 
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
  at 
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)

[jira] [Commented] (SPARK-15070) Data corruption when using Dataset.groupBy[K : Encoder](func: T => K) when data loaded from JSON file.

2016-05-03 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268500#comment-15268500
 ] 

Pete Robbins commented on SPARK-15070:
--

could this be related to https://issues.apache.org/jira/browse/SPARK-12555 ?

> Data corruption when using Dataset.groupBy[K : Encoder](func: T => K) when 
> data loaded from JSON file.
> --
>
> Key: SPARK-15070
> URL: https://issues.apache.org/jira/browse/SPARK-15070
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, SQL
>Affects Versions: 1.6.1
> Environment: produced on Mac OS X 10.11.4 in local mode
>Reporter: Eric Wasserman
>
> full running case at: https://github.com/ewasserman/spark-bug.git
> Bug.scala
> ==
> package bug
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.{SparkContext, SparkConf}
> case class BugRecord(m: String, elapsed_time: java.lang.Double)
> object Bug {
>   def main(args: Array[String]): Unit = {
> val c = new SparkConf().setMaster("local[2]").setAppName("BugTest")
> val sc = new SparkContext(c)
> val sqlc = new SQLContext(sc)
> import sqlc.implicits._
> val logs = sqlc.read.json("bug-data.json").as[BugRecord]
> logs.groupBy(r => "FOO").agg(avg($"elapsed_time").as[Double]).show(20, 
> truncate = false)
> 
> sc.stop()
>   }
> }
> bug-data.json
> ==
> {"m":"POST","elapsed_time":0.123456789012345678,"source_time":"abcdefghijk"}
> -
> Expected Output:
> +---+---+
> |_1 |_2 |
> +---+---+
> |FOO |0.12345678901234568|
> +---+---+
> Observed Output:
> +---+---+
> |_1 |_2 |
> +---+---+
> |POSTabc|0.12345726584950388|
> +---+---+
> The grouping key has been corrupted (it is *not* the product of the groupBy 
> function) and is a combination of bytes from the actual key column and an 
> extra attribute in the JSON not present in the case class. The aggregated 
> value is also corrupted.
> NOTE:
> The problem does not manifest when using an alternate form of groupBy:
> logs.groupBy($"m").agg(avg($"elapsed_time").as[Double])
> The corrupted key problem does not manifest when there is not an additional 
> field in the JSON. Ie. if the data file is this:
> {"m":"POST","elapsed_time":0.123456789012345678}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java

2016-05-03 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268232#comment-15268232
 ] 

Pete Robbins commented on SPARK-13552:
--

[~aroberts] This Jira can be closed as this is not a Spark issue

> Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
> -
>
> Key: SPARK-13552
> URL: https://issues.apache.org/jira/browse/SPARK-13552
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: IBM Java only, all platforms
>Reporter: Adam Roberts
>Priority: Minor
> Attachments: DefectBadMinValueLongResized.jpg
>
>
> The Long.minValue test fails on IBM Java 8, we get the following incorrect 
> answer with the slightly simplified test case:
> {code:SQL}
> val tester = sql(s"SELECT ${Long.MinValue} FROM testData")
> {code}
> result is
> _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's 
> only one bit difference if we convert to binary representation).
> Here's the full test output:
> {code}
> Results do not match for query:
> == Parsed Logical Plan ==
> 'GlobalLimit 1
> +- 'LocalLimit 1
>+- 'Sort ['key ASC], true
>   +- 'Project [unresolvedalias(-9223372036854775808, None)]
>  +- 'UnresolvedRelation `testData`, None
> == Analyzed Logical Plan ==
> (-9223372036854775808): decimal(19,0)
> GlobalLimit 1
> +- LocalLimit 1
>+- Project [(-9223372036854775808)#4391]
>   +- Sort [key#101 ASC], true
>  +- Project [-9223372036854775808 AS 
> (-9223372036854775808)#4391,key#101]
> +- SubqueryAlias testData
>+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at 
> beforeAll at BeforeAndAfterAll.scala:187
> == Optimized Logical Plan ==
> GlobalLimit 1
> +- LocalLimit 1
>+- Project [(-9223372036854775808)#4391]
>   +- Sort [key#101 ASC], true
>  +- Project [-9223372036854775808 AS 
> (-9223372036854775808)#4391,key#101]
> +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at 
> beforeAll at BeforeAndAfterAll.scala:187
> == Physical Plan ==
> TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], 
> output=[(-9223372036854775808)#4391])
> +- WholeStageCodegen
>:  +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101]
>: +- INPUT
>+- Scan ExistingRDD[key#101,value#102]
> == Results ==
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 1 ==
> ![-9223372036854775808] [-9223372041149743104]
> {code}
> Debugging in Intellij shows the query seems to be parsed OK and we eventually 
> have a schema with the correct data in the struct field but the BigDecimal's 
> BigInteger is incorrect when we have a GenericRowWithSchema.
> I've identified that the problem started when SPARK-12575 was implemented and 
> suspect the following paragraph is important:
> "Hive and the SQL Parser treat decimal literals differently. Hive will turn 
> any decimal into a Double whereas the SQL Parser would convert a 
> non-scientific decimal into a BigDecimal, and would turn a scientific decimal 
> into a Double. We follow Hive's behavior here. The new parser supports a big 
> decimal literal, for instance: 81923801.42BD, which can be used when a big 
> decimal is needed."
> Done, both "value" and "row" return the correct result for both Java 
> implementations: -9223372036854775808
> FWIW, I know the first time we can see the incorrect row values is in the 
> {code}withCallback[T]{code} method in DataFrame.scala, the specific line of 
> code is
> {code}
> val result = action(df)
> {code}
> Stepping into this doesn't clearly indicate how the resulting rows are being 
> produced though (could be that I'm debugging with the wrong thread in 
> Intellij - the first time I see a value for "result" is when it's too late - 
> when we're seeing the incorrect values).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java

2016-04-28 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262482#comment-15262482
 ] 

Pete Robbins commented on SPARK-13552:
--

This is looking like an issue with the IBM implementation of 
java.math.BigInteger. I'm still investigating and we can close this jira if my 
theory is correct.



> Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
> -
>
> Key: SPARK-13552
> URL: https://issues.apache.org/jira/browse/SPARK-13552
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: IBM Java only, all platforms
>Reporter: Adam Roberts
>Priority: Minor
> Attachments: DefectBadMinValueLongResized.jpg
>
>
> The Long.minValue test fails on IBM Java 8, we get the following incorrect 
> answer with the slightly simplified test case:
> {code:SQL}
> val tester = sql(s"SELECT ${Long.MinValue} FROM testData")
> {code}
> result is
> _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's 
> only one bit difference if we convert to binary representation).
> Here's the full test output:
> {code}
> Results do not match for query:
> == Parsed Logical Plan ==
> 'GlobalLimit 1
> +- 'LocalLimit 1
>+- 'Sort ['key ASC], true
>   +- 'Project [unresolvedalias(-9223372036854775808, None)]
>  +- 'UnresolvedRelation `testData`, None
> == Analyzed Logical Plan ==
> (-9223372036854775808): decimal(19,0)
> GlobalLimit 1
> +- LocalLimit 1
>+- Project [(-9223372036854775808)#4391]
>   +- Sort [key#101 ASC], true
>  +- Project [-9223372036854775808 AS 
> (-9223372036854775808)#4391,key#101]
> +- SubqueryAlias testData
>+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at 
> beforeAll at BeforeAndAfterAll.scala:187
> == Optimized Logical Plan ==
> GlobalLimit 1
> +- LocalLimit 1
>+- Project [(-9223372036854775808)#4391]
>   +- Sort [key#101 ASC], true
>  +- Project [-9223372036854775808 AS 
> (-9223372036854775808)#4391,key#101]
> +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at 
> beforeAll at BeforeAndAfterAll.scala:187
> == Physical Plan ==
> TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], 
> output=[(-9223372036854775808)#4391])
> +- WholeStageCodegen
>:  +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101]
>: +- INPUT
>+- Scan ExistingRDD[key#101,value#102]
> == Results ==
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 1 ==
> ![-9223372036854775808] [-9223372041149743104]
> {code}
> Debugging in Intellij shows the query seems to be parsed OK and we eventually 
> have a schema with the correct data in the struct field but the BigDecimal's 
> BigInteger is incorrect when we have a GenericRowWithSchema.
> I've identified that the problem started when SPARK-12575 was implemented and 
> suspect the following paragraph is important:
> "Hive and the SQL Parser treat decimal literals differently. Hive will turn 
> any decimal into a Double whereas the SQL Parser would convert a 
> non-scientific decimal into a BigDecimal, and would turn a scientific decimal 
> into a Double. We follow Hive's behavior here. The new parser supports a big 
> decimal literal, for instance: 81923801.42BD, which can be used when a big 
> decimal is needed."
> Done, both "value" and "row" return the correct result for both Java 
> implementations: -9223372036854775808
> FWIW, I know the first time we can see the incorrect row values is in the 
> {code}withCallback[T]{code} method in DataFrame.scala, the specific line of 
> code is
> {code}
> val result = action(df)
> {code}
> Stepping into this doesn't clearly indicate how the resulting rows are being 
> produced though (could be that I'm debugging with the wrong thread in 
> Intellij - the first time I see a value for "result" is when it's too late - 
> when we're seeing the incorrect values).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14848) DatasetSuite - Java encoder fails on Big Endian platforms

2016-04-22 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins updated SPARK-14848:
-
Description: 
Since this PR https://github.com/apache/spark/pull/10703 for 
https://issues.apache.org/jira/browse/SPARK-12756 the "Java encoder" test in 
DatasetSuite has been failing on big endian platforms:

- Java encoder *** FAILED ***
  Array((JavaData(2),1), (JavaData(1),1)) did not equal List((JavaData(1),1), 
(JavaData(2),1)) (DatasetSuite.scala:478)

I note that the code for the "Kryo encoder" test was changed in the PR to use 
toSet and compare results against a Set to stop it failing in the same way 
whereas the Java encoder test still uses toSeq. 

Is it that the order is not guaranteed (but happens to be in the expected order 
on little endian) and this is a test issue?

  was:
Since this PR https://github.com/apache/spark/pull/10703 for 
https://issues.apache.org/jira/browse/SPARK-12756 the "Java encoder" test in 
DatasetSuite has been failing on big endian platforms:

- Java encoder *** FAILED ***
  Array((JavaData(2),1), (JavaData(1),1)) did not equal List((JavaData(1),1), 
(JavaData(2),1)) (DatasetSuite.scala:478)

I note that the code for the "Kyro encoder" test was changed in the PR to use 
toSet and compare results against a Set to stop it failing in the same way 
whereas the Java encoder test still uses toSeq. 

Is it that the order is not guaranteed (but happens to be in the expected order 
on little endian) and this is a test issue?


> DatasetSuite - Java encoder fails on Big Endian platforms
> -
>
> Key: SPARK-14848
> URL: https://issues.apache.org/jira/browse/SPARK-14848
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>
> Since this PR https://github.com/apache/spark/pull/10703 for 
> https://issues.apache.org/jira/browse/SPARK-12756 the "Java encoder" test in 
> DatasetSuite has been failing on big endian platforms:
> - Java encoder *** FAILED ***
>   Array((JavaData(2),1), (JavaData(1),1)) did not equal List((JavaData(1),1), 
> (JavaData(2),1)) (DatasetSuite.scala:478)
> I note that the code for the "Kryo encoder" test was changed in the PR to use 
> toSet and compare results against a Set to stop it failing in the same way 
> whereas the Java encoder test still uses toSeq. 
> Is it that the order is not guaranteed (but happens to be in the expected 
> order on little endian) and this is a test issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14848) DatasetSuite - Java encoder fails on Big Endian platforms

2016-04-22 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253816#comment-15253816
 ] 

Pete Robbins commented on SPARK-14848:
--

changing the Java encoder test to use toSet and compare against Set(...) makes 
the test pass on both little endian and big endian platforms.

I will submit a PR.

[~cloud_fan] can you confirm my thoughts?

> DatasetSuite - Java encoder fails on Big Endian platforms
> -
>
> Key: SPARK-14848
> URL: https://issues.apache.org/jira/browse/SPARK-14848
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>
> Since this PR https://github.com/apache/spark/pull/10703 for 
> https://issues.apache.org/jira/browse/SPARK-12756 the "Java encoder" test in 
> DatasetSuite has been failing on big endian platforms:
> - Java encoder *** FAILED ***
>   Array((JavaData(2),1), (JavaData(1),1)) did not equal List((JavaData(1),1), 
> (JavaData(2),1)) (DatasetSuite.scala:478)
> I note that the code for the "Kyro encoder" test was changed in the PR to use 
> toSet and compare results against a Set to stop it failing in the same way 
> whereas the Java encoder test still uses toSeq. 
> Is it that the order is not guaranteed (but happens to be in the expected 
> order on little endian) and this is a test issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14848) DatasetSuite - Java encoder fails on Big Endian platforms

2016-04-22 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-14848:


 Summary: DatasetSuite - Java encoder fails on Big Endian platforms
 Key: SPARK-14848
 URL: https://issues.apache.org/jira/browse/SPARK-14848
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Pete Robbins


Since this PR https://github.com/apache/spark/pull/10703 for 
https://issues.apache.org/jira/browse/SPARK-12756 the "Java encoder" test in 
DatasetSuite has been failing on big endian platforms:

- Java encoder *** FAILED ***
  Array((JavaData(2),1), (JavaData(1),1)) did not equal List((JavaData(1),1), 
(JavaData(2),1)) (DatasetSuite.scala:478)

I note that the code for the "Kyro encoder" test was changed in the PR to use 
toSet and compare results against a Set to stop it failing in the same way 
whereas the Java encoder test still uses toSeq. 

Is it that the order is not guaranteed (but happens to be in the expected order 
on little endian) and this is a test issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14151) Propose to refactor and expose Metrics Sink and Source interface

2016-03-25 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211638#comment-15211638
 ] 

Pete Robbins commented on SPARK-14151:
--

Agreed that is the way to go. I was also working on it but will leave it to 
you. In addition we will need to document the interfaces and use.

> Propose to refactor and expose Metrics Sink and Source interface
> 
>
> Key: SPARK-14151
> URL: https://issues.apache.org/jira/browse/SPARK-14151
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Saisai Shao
>Priority: Minor
>
> MetricsSystem is designed for plug-in different sources and sinks, user could 
> write their own sources and sinks and configured through metrics.properties, 
> MetricsSystem will register it through reflection. But current Source and 
> Sink interface is private, which means user cannot create their own sources 
> and sinks unless using the same package.
> So here propose to expose source and sink interface, this will let user build 
> and maintain their own source and sink, alleviate the maintenance overhead of 
> spark codebase. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14151) Propose to expose Metrics Sink and Source interface

2016-03-25 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211560#comment-15211560
 ] 

Pete Robbins commented on SPARK-14151:
--

In addition the constructor used by MetricsSystem for Sinks passes the 
SecurityManager which is also marked as private[spark]. Currently only the 
MetricsServlet sink uses this.

We could either a) remove private[spark] from SecurityManager or b) add 
additional logic in MetricsSystem to look for a Sink constructor which does not 
have the SecurityManager as a parameter if the one with SecurityManager is not 
found

> Propose to expose Metrics Sink and Source interface
> ---
>
> Key: SPARK-14151
> URL: https://issues.apache.org/jira/browse/SPARK-14151
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Saisai Shao
>Priority: Minor
>
> MetricsSystem is designed for plug-in different sources and sinks, user could 
> write their own sources and sinks and configured through metrics.properties, 
> MetricsSystem will register it through reflection. But current Source and 
> Sink interface is private, which means user cannot create their own sources 
> and sinks unless using the same package.
> So here propose to expose source and sink interface, this will let user build 
> and maintain their own source and sink, alleviate the maintenance overhead of 
> spark codebase. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10610) Using AppName instead of AppId in the name of all metrics

2016-03-04 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179507#comment-15179507
 ] 

Pete Robbins edited comment on SPARK-10610 at 3/4/16 8:01 AM:
--

I think the appId is an important piece of information when visualizing the 
metrics along with hostname, executorId etc. I'm writing a sink and reporter to 
push the metrics to Elasticsearch and I include these in the metrics types for 
better correlation. eg

{
"timestamp": "2016-03-03T15:58:31.903+",
"hostName": "9.20.187.127"
"applicationId": "app-20160303155742-0005",
"executorId": "driver",
"BlockManager_memory_maxMem_MB": 3933
  }

The appId and executorId I extract form the metric name. When the sink is 
instantiated I don't believe I have access to any Utils to obtain the current 
appId and executorId so I'm kind of relying on these being in the metric name 
for the moment.

Is it possible to make appId, applicationName, executorId avaiable to me via 
some Utils function that I have access to in a metrics Sink?

I guess I'm asking: How can I get hold of the SparkConf if I've not been passed 
it?


was (Author: robbinspg):
I think the appId is an important piece of information when visualizing the 
metrics along with hostname, executorId etc. I'm writing a sink and reporter to 
push the metrics to Elasticsearch and I include these in the metrics types for 
better correlation. eg

{
"timestamp": "2016-03-03T15:58:31.903+",
"hostName": "9.20.187.127"
"applicationId": "app-20160303155742-0005",
"executorId": "driver",
"BlockManager_memory_maxMem_MB": 3933
  }

The appId and executorId I extract form the metric name. When the sink is 
instantiated I don't believe I have access to any Utils to obtain the current 
appId and executorId so I'm kind of relying on these being in the metric name 
for the moment.

Is it possible to make appId, applicationName, executorId avaiable to me via 
some Utils function that I have access to in a metrics Sink?

> Using AppName instead of AppId in the name of all metrics
> -
>
> Key: SPARK-10610
> URL: https://issues.apache.org/jira/browse/SPARK-10610
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Yi Tian
>Priority: Minor
>
> When we using {{JMX}} to monitor spark system,  We have to configure the name 
> of target metrics in the monitor system. But the current name of metrics is 
> {{appId}} + {{executorId}} + {{source}} . So when the spark program 
> restarted, we have to update the name of metrics in the monitor system.
> We should add an optional configuration property to control whether using the 
> appName instead of appId in spark metrics system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10610) Using AppName instead of AppId in the name of all metrics

2016-03-03 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179507#comment-15179507
 ] 

Pete Robbins commented on SPARK-10610:
--

I think the appId is an important piece of information when visualizing the 
metrics along with hostname, executorId etc. I'm writing a sink and reporter to 
push the metrics to Elasticsearch and I include these in the metrics types for 
better correlation. eg

{
"timestamp": "2016-03-03T15:58:31.903+",
"hostName": "9.20.187.127"
"applicationId": "app-20160303155742-0005",
"executorId": "driver",
"BlockManager_memory_maxMem_MB": 3933
  }

The appId and executorId I extract form the metric name. When the sink is 
instantiated I don't believe I have access to any Utils to obtain the current 
appId and executorId so I'm kind of relying on these being in the metric name 
for the moment.

Is it possible to make appId, applicationName, executorId avaiable to me via 
some Utils function that I have access to in a metrics Sink?

> Using AppName instead of AppId in the name of all metrics
> -
>
> Key: SPARK-10610
> URL: https://issues.apache.org/jira/browse/SPARK-10610
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Yi Tian
>Priority: Minor
>
> When we using {{JMX}} to monitor spark system,  We have to configure the name 
> of target metrics in the monitor system. But the current name of metrics is 
> {{appId}} + {{executorId}} + {{source}} . So when the spark program 
> restarted, we have to update the name of metrics in the monitor system.
> We should add an optional configuration property to control whether using the 
> appName instead of appId in spark metrics system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082879#comment-15082879
 ] 

Pete Robbins commented on SPARK-12647:
--

@sowen should I close this and move the PR?


> 1.6 branch test failure 
> o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
> reducers: aggregate operator
> ---
>
> Key: SPARK-12647
> URL: https://issues.apache.org/jira/browse/SPARK-12647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Pete Robbins
>Priority: Minor
>
> All 1.6 branch builds failing eg 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/
> 3 did not equal 2
> PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-12647:


 Summary: 1.6 branch test failure 
o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
reducers: aggregate operator
 Key: SPARK-12647
 URL: https://issues.apache.org/jira/browse/SPARK-12647
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Pete Robbins
Priority: Minor


All 1.6 branch builds failing eg 
https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/

3 did not equal 2

PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082879#comment-15082879
 ] 

Pete Robbins edited comment on SPARK-12647 at 1/5/16 11:30 AM:
---

[~sowen] should I close this and move the PR?



was (Author: robbinspg):
@sowen should I close this and move the PR?


> 1.6 branch test failure 
> o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
> reducers: aggregate operator
> ---
>
> Key: SPARK-12647
> URL: https://issues.apache.org/jira/browse/SPARK-12647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Pete Robbins
>Priority: Minor
>
> All 1.6 branch builds failing eg 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/
> 3 did not equal 2
> PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12470) Incorrect calculation of row size in o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner

2015-12-22 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068693#comment-15068693
 ] 

Pete Robbins commented on SPARK-12470:
--

I'm fairly sure the code in my PR is correct but it is causing an 
ExchangeCoordinatorSuite test to fail. I'm struggling to see why this test is 
failing with the change I made. The failure is:

 determining the number of reducers: aggregate operator *** FAILED ***
 3 did not equal 2 (ExchangeCoordinatorSuite.scala:316)

putting some debug into the test I see that before my change the pre-shuffle 
partition sizes are 600, 600, 600, 600, 600 an after my change are 800. 800. 
800. 800. 720 but I have no idea why. I'd really appreciate anyone with 
knowledge of this area a) checking my PR and b) helping explain the failing 
test.

> Incorrect calculation of row size in 
> o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
> ---
>
> Key: SPARK-12470
> URL: https://issues.apache.org/jira/browse/SPARK-12470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Pete Robbins
>Priority: Minor
>
> While looking into https://issues.apache.org/jira/browse/SPARK-12319 I 
> noticed that the row size is incorrectly calculated.
> The "sizeReduction" value is calculated in words:
>// The number of words we can reduce when we concat two rows together.
> // The only reduction comes from merging the bitset portion of the two 
> rows, saving 1 word.
> val sizeReduction = bitset1Words + bitset2Words - outputBitsetWords
> but then it is subtracted from the size of the row in bytes:
>|out.pointTo(buf, ${schema1.size + schema2.size}, sizeInBytes - 
> $sizeReduction);
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12470) Incorrect calculation of row size in o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner

2015-12-22 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068693#comment-15068693
 ] 

Pete Robbins edited comment on SPARK-12470 at 12/22/15 9:47 PM:


I'm fairly sure the code in my PR is correct but it is causing an 
ExchangeCoordinatorSuite test to fail. I'm struggling to see why this test is 
failing with the change I made. The failure is:

 determining the number of reducers: aggregate operator *** FAILED ***
 3 did not equal 2 (ExchangeCoordinatorSuite.scala:316)

putting some debug into the test I see that before my change the pre-shuffle 
partition sizes are 600, 600, 600, 600, 600 an after my change are 800. 800. 
800. 800. 720 but I have no idea why. I'd really appreciate anyone with 
knowledge of this area a) checking my PR and b) helping explain the failing 
test.

EDIT Please ignore. Merged with latest head including changes for SPARK-12388 
now passes all tests


was (Author: robbinspg):
I'm fairly sure the code in my PR is correct but it is causing an 
ExchangeCoordinatorSuite test to fail. I'm struggling to see why this test is 
failing with the change I made. The failure is:

 determining the number of reducers: aggregate operator *** FAILED ***
 3 did not equal 2 (ExchangeCoordinatorSuite.scala:316)

putting some debug into the test I see that before my change the pre-shuffle 
partition sizes are 600, 600, 600, 600, 600 an after my change are 800. 800. 
800. 800. 720 but I have no idea why. I'd really appreciate anyone with 
knowledge of this area a) checking my PR and b) helping explain the failing 
test.

> Incorrect calculation of row size in 
> o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
> ---
>
> Key: SPARK-12470
> URL: https://issues.apache.org/jira/browse/SPARK-12470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Pete Robbins
>Priority: Minor
>
> While looking into https://issues.apache.org/jira/browse/SPARK-12319 I 
> noticed that the row size is incorrectly calculated.
> The "sizeReduction" value is calculated in words:
>// The number of words we can reduce when we concat two rows together.
> // The only reduction comes from merging the bitset portion of the two 
> rows, saving 1 word.
> val sizeReduction = bitset1Words + bitset2Words - outputBitsetWords
> but then it is subtracted from the size of the row in bytes:
>|out.pointTo(buf, ${schema1.size + schema2.size}, sizeInBytes - 
> $sizeReduction);
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12470) Incorrect calculation of row size in o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner

2015-12-21 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins updated SPARK-12470:
-
Component/s: SQL
Summary: Incorrect calculation of row size in 
o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner  (was: Incorrect 
calculation of row size in 
o.a.s.catalyst.expressions.codegen.GenerateUnsafeRowJoiner)

> Incorrect calculation of row size in 
> o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
> ---
>
> Key: SPARK-12470
> URL: https://issues.apache.org/jira/browse/SPARK-12470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Pete Robbins
>Priority: Minor
>
> While looking into https://issues.apache.org/jira/browse/SPARK-12319 I 
> noticed that the row size is incorrectly calculated.
> The "sizeReduction" value is calculated in words:
>// The number of words we can reduce when we concat two rows together.
> // The only reduction comes from merging the bitset portion of the two 
> rows, saving 1 word.
> val sizeReduction = bitset1Words + bitset2Words - outputBitsetWords
> but then it is subtracted from the size of the row in bytes:
>|out.pointTo(buf, ${schema1.size + schema2.size}, sizeInBytes - 
> $sizeReduction);
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12470) Incorrect calculation of row size in o.a.s.catalyst.expressions.codegen.GenerateUnsafeRowJoiner

2015-12-21 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-12470:


 Summary: Incorrect calculation of row size in 
o.a.s.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
 Key: SPARK-12470
 URL: https://issues.apache.org/jira/browse/SPARK-12470
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.5.2
Reporter: Pete Robbins
Priority: Minor


While looking into https://issues.apache.org/jira/browse/SPARK-12319 I noticed 
that the row size is incorrectly calculated.

The "sizeReduction" value is calculated in words:

   // The number of words we can reduce when we concat two rows together.
// The only reduction comes from merging the bitset portion of the two 
rows, saving 1 word.
val sizeReduction = bitset1Words + bitset2Words - outputBitsetWords

but then it is subtracted from the size of the row in bytes:

   |out.pointTo(buf, ${schema1.size + schema2.size}, sizeInBytes - 
$sizeReduction);
 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6873) Some Hive-Catalyst comparison tests fail due to unimportant order of some printed elements

2015-09-25 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907841#comment-14907841
 ] 

Pete Robbins commented on SPARK-6873:
-

I no longer see these errors in my 1.5 branch Java 8 build. Did someone fix 
them, remove the tests or is it just chance?

> Some Hive-Catalyst comparison tests fail due to unimportant order of some 
> printed elements
> --
>
> Key: SPARK-6873
> URL: https://issues.apache.org/jira/browse/SPARK-6873
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.3.1
>Reporter: Sean Owen
>Assignee: Cheng Lian
>Priority: Minor
>
> As I mentioned, I've been seeing 4 test failures in Hive tests for a while, 
> and actually it still affects master. I think it's a superficial problem that 
> only turns up when running on Java 8, but still, would probably be an easy 
> fix and good to fix.
> Specifically, here are four tests and the bit that fails the comparison, 
> below. I tried to diagnose this but had trouble even finding where some of 
> this occurs, like the list of synonyms?
> {code}
> - show_tblproperties *** FAILED ***
>   Results do not match for show_tblproperties:
> ...
>   !== HIVE - 2 row(s) ==   == CATALYST - 2 row(s) ==
>   !tmptruebar bar value
>   !barbar value   tmp true (HiveComparisonTest.scala:391)
> {code}
> {code}
> - show_create_table_serde *** FAILED ***
>   Results do not match for show_create_table_serde:
> ...
>WITH SERDEPROPERTIES (  WITH 
> SERDEPROPERTIES ( 
>   !  'serialization.format'='$', 
> 'field.delim'=',', 
>   !  'field.delim'=',')  
> 'serialization.format'='$')
> {code}
> {code}
> - udf_std *** FAILED ***
>   Results do not match for udf_std:
> ...
>   !== HIVE - 2 row(s) == == CATALYST 
> - 2 row(s) ==
>std(x) - Returns the standard deviation of a set of numbers   std(x) - 
> Returns the standard deviation of a set of numbers
>   !Synonyms: stddev_pop, stddev  Synonyms: 
> stddev, stddev_pop (HiveComparisonTest.scala:391)
> {code}
> {code}
> - udf_stddev *** FAILED ***
>   Results do not match for udf_stddev:
> ...
>   !== HIVE - 2 row(s) ==== 
> CATALYST - 2 row(s) ==
>stddev(x) - Returns the standard deviation of a set of numbers   stddev(x) 
> - Returns the standard deviation of a set of numbers
>   !Synonyms: stddev_pop, stdSynonyms: 
> std, stddev_pop (HiveComparisonTest.scala:391)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9710) RPackageUtilsSuite fails if R is not installed

2015-09-23 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904485#comment-14904485
 ] 

Pete Robbins commented on SPARK-9710:
-

The Fix Version for this says 1.5.0 but the PR is not in the 1.5 branch as far 
as I can see and my 1.5 branch build is failing with this issue

> RPackageUtilsSuite fails if R is not installed
> --
>
> Key: SPARK-9710
> URL: https://issues.apache.org/jira/browse/SPARK-9710
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 1.5.0
>
>
> That's because there's a bug in RUtils.scala. PR soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10454) Flaky test: o.a.s.scheduler.DAGSchedulerSuite.late fetch failures don't cause multiple concurrent attempts for the same map stage

2015-09-04 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731377#comment-14731377
 ] 

Pete Robbins commented on SPARK-10454:
--

This is another case of not waiting for events to drain form the listenerBus

> Flaky test: o.a.s.scheduler.DAGSchedulerSuite.late fetch failures don't cause 
> multiple concurrent attempts for the same map stage
> -
>
> Key: SPARK-10454
> URL: https://issues.apache.org/jira/browse/SPARK-10454
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 1.5.1
>Reporter: Pete Robbins
>Priority: Minor
>
> test case fails intermittently in Jenkins.
> For eg, see the following builds-
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41991/
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41999/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10454) Flaky test: o.a.s.scheduler.DAGSchedulerSuite.late fetch failures don't cause multiple concurrent attempts for the same map stage

2015-09-04 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-10454:


 Summary: Flaky test: o.a.s.scheduler.DAGSchedulerSuite.late fetch 
failures don't cause multiple concurrent attempts for the same map stage
 Key: SPARK-10454
 URL: https://issues.apache.org/jira/browse/SPARK-10454
 Project: Spark
  Issue Type: Bug
  Components: Scheduler, Spark Core
Affects Versions: 1.5.1
Reporter: Pete Robbins
Priority: Minor


test case fails intermittently in Jenkins.

For eg, see the following builds-
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41991/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41999/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10431) Intermittent test failure in InputOutputMetricsSuite

2015-09-03 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-10431:


 Summary: Intermittent test failure in InputOutputMetricsSuite
 Key: SPARK-10431
 URL: https://issues.apache.org/jira/browse/SPARK-10431
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.0
Reporter: Pete Robbins
Priority: Minor


I sometimes get test failures such as:

- input metrics with cache and coalesce *** FAILED ***
  5994472 did not equal 6044472 (InputOutputMetricsSuite.scala:101)

Tracking this down by adding some debug it seems this is a timing issue in the 
test.

test("input metrics with cache and coalesce") {
// prime the cache manager
val rdd = sc.textFile(tmpFilePath, 4).cache()
rdd.collect() // <== #1

val bytesRead = runAndReturnBytesRead {  // <== #2
  rdd.count()
}
val bytesRead2 = runAndReturnBytesRead {
  rdd.coalesce(4).count()
}

// for count and coelesce, the same bytes should be read.
assert(bytesRead != 0)
assert(bytesRead2 == bytesRead) // fails
  }

What is happening is that the runAndReturnBytesRead (#2) function adds a 
SparkListener to monitor TaskEnd events to total the bytes read from eg the 
rdd.count()

In the case where this fails the listener receives a TaskEnd event from earlier 
tasks (eg #1) and this mucks up the totalling. This happens because the 
asynchronous thread processing the event queue and notifying the listeners has 
not processed one of the taskEnd events before the new listener is added so it 
also receives that event.

There is a simple fix to the test to wait for the event queue to be empty 
before adding the new listener and I will submit a pull request for that.

I also notice that a lot of the tests add a listener and as there is no 
removeSparkListener api the number of listeners on the context builds up during 
the running of the suite. This is probably why I see this issue running on slow 
machines.

A wider question may be: should a listener receive events that occurred before 
it was added?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9869) InputStreamSuite.socket input stream is flaky in Jenkins

2015-09-03 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729024#comment-14729024
 ] 

Pete Robbins commented on SPARK-9869:
-

My pull request build for https://issues.apache.org/jira/browse/SPARK-10431 
failed with this error so I took a look. I think it's another timing issue 
where the assert on the progress listener occurs before the asynchronous 
notification thread has completed processing.

Something like this should fix it:

diff --git 
a/streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala 
b/streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
index ec2852d..e27d315 100644
--- 
a/streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
+++ 
b/streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
@@ -76,6 +76,8 @@
   fail("Timeout: cannot finish all batches in 30 seconds")
 }
 
+ssc.scheduler.listenerBus.waitUntilEmpty(500)
+
 // Verify all "InputInfo"s have been reported
 assert(ssc.progressListener.numTotalReceivedRecords === input.size)
 assert(ssc.progressListener.numTotalProcessedRecords === input.size)


> InputStreamSuite.socket input stream is flaky in Jenkins
> 
>
> Key: SPARK-9869
> URL: https://issues.apache.org/jira/browse/SPARK-9869
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Josh Rosen
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/68/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=centos/testReport/junit/org.apache.spark.streaming/InputStreamsSuite/socket_input_stream/
> {code}
> org.apache.spark.streaming.InputStreamsSuite.socket input stream
> sbt.ForkMain$ForkError: 4 did not equal 5
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1$$anonfun$apply$mcV$sp$4$$anonfun$apply$5.apply(InputStreamsSuite.scala:80)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1$$anonfun$apply$mcV$sp$4$$anonfun$apply$5.apply(InputStreamsSuite.scala:53)
>   at 
> org.apache.spark.streaming.TestSuiteBase$class.withStreamingContext(TestSuiteBase.scala:272)
>   at 
> org.apache.spark.streaming.InputStreamsSuite.withStreamingContext(InputStreamsSuite.scala:45)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1$$anonfun$apply$mcV$sp$4.apply(InputStreamsSuite.scala:53)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1$$anonfun$apply$mcV$sp$4.apply(InputStreamsSuite.scala:48)
>   at 
> org.apache.spark.streaming.TestSuiteBase$class.withTestServer(TestSuiteBase.scala:289)
>   at 
> org.apache.spark.streaming.InputStreamsSuite.withTestServer(InputStreamsSuite.scala:45)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply$mcV$sp(InputStreamsSuite.scala:48)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply(InputStreamsSuite.scala:48)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply(InputStreamsSuite.scala:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9869) InputStreamSuite.socket input stream is flaky in Jenkins

2015-09-03 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729030#comment-14729030
 ] 

Pete Robbins commented on SPARK-9869:
-

should I add this change to the PR for Spark-10431 or submit PR for this?

> InputStreamSuite.socket input stream is flaky in Jenkins
> 
>
> Key: SPARK-9869
> URL: https://issues.apache.org/jira/browse/SPARK-9869
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Josh Rosen
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/68/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=centos/testReport/junit/org.apache.spark.streaming/InputStreamsSuite/socket_input_stream/
> {code}
> org.apache.spark.streaming.InputStreamsSuite.socket input stream
> sbt.ForkMain$ForkError: 4 did not equal 5
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1$$anonfun$apply$mcV$sp$4$$anonfun$apply$5.apply(InputStreamsSuite.scala:80)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1$$anonfun$apply$mcV$sp$4$$anonfun$apply$5.apply(InputStreamsSuite.scala:53)
>   at 
> org.apache.spark.streaming.TestSuiteBase$class.withStreamingContext(TestSuiteBase.scala:272)
>   at 
> org.apache.spark.streaming.InputStreamsSuite.withStreamingContext(InputStreamsSuite.scala:45)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1$$anonfun$apply$mcV$sp$4.apply(InputStreamsSuite.scala:53)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1$$anonfun$apply$mcV$sp$4.apply(InputStreamsSuite.scala:48)
>   at 
> org.apache.spark.streaming.TestSuiteBase$class.withTestServer(TestSuiteBase.scala:289)
>   at 
> org.apache.spark.streaming.InputStreamsSuite.withTestServer(InputStreamsSuite.scala:45)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply$mcV$sp(InputStreamsSuite.scala:48)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply(InputStreamsSuite.scala:48)
>   at 
> org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply(InputStreamsSuite.scala:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9644) Support update DecimalType with precision 18 in UnsafeRow

2015-08-19 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702705#comment-14702705
 ] 

Pete Robbins commented on SPARK-9644:
-

Thanks. I think this needs to go in to 1.5 to make that available on IBM jdks.

 Support update DecimalType with precision  18 in UnsafeRow
 ---

 Key: SPARK-9644
 URL: https://issues.apache.org/jira/browse/SPARK-9644
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.5.0
Reporter: Davies Liu
Assignee: Davies Liu
Priority: Critical
 Fix For: 1.5.0


 Currently, we don't support using DecimalType with precision  18 in new 
 unsafe aggregation, it's good to support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9644) Support update DecimalType with precision 18 in UnsafeRow

2015-08-18 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701202#comment-14701202
 ] 

Pete Robbins commented on SPARK-9644:
-

This change makes assumptions about the implementation of BigInteger such as 
field names and their contents. This may work with the current OpenJDK 
implementation but is not guaranteed to work in the future. Also this causes an 
abort on IBM jdks as it has a different underlying implementation of BigInteger

The code should not access the private fields signum and mag but should use the 
defined Java API methods signum() and toByteArray() to extract the required 
values.

I can create a patch to correct this.

 Support update DecimalType with precision  18 in UnsafeRow
 ---

 Key: SPARK-9644
 URL: https://issues.apache.org/jira/browse/SPARK-9644
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.5.0
Reporter: Davies Liu
Assignee: Davies Liu
Priority: Critical
 Fix For: 1.5.0


 Currently, we don't support using DecimalType with precision  18 in new 
 unsafe aggregation, it's good to support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9644) Support update DecimalType with precision 18 in UnsafeRow

2015-08-18 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701926#comment-14701926
 ] 

Pete Robbins commented on SPARK-9644:
-

How much slower is the public API? We would only be  creating a small byte 
array on the heap then copying. 
On deserialization I doubt there is any performance gain to creating an empty 
BigInteger then copying in the signum and magnitude array vs simply 
constructing the object passing those as parameters.

The main issue is that fast path for OpenJDK that you propose may only work 
for the current implementation of BigInteger leaving the potential for a jdk 
update to break Spark. 

It would be possible to have a separate codepath for the current OpenJDK 
implementation but I don't think that is desirable.


 Support update DecimalType with precision  18 in UnsafeRow
 ---

 Key: SPARK-9644
 URL: https://issues.apache.org/jira/browse/SPARK-9644
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.5.0
Reporter: Davies Liu
Assignee: Davies Liu
Priority: Critical
 Fix For: 1.5.0


 Currently, we don't support using DecimalType with precision  18 in new 
 unsafe aggregation, it's good to support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6873) Some Hive-Catalyst comparison tests fail due to unimportant order of some printed elements

2015-07-30 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647322#comment-14647322
 ] 

Pete Robbins commented on SPARK-6873:
-

We've been trying to get a clean build/test using Java 8 and we see these 
errors so I think this is still a problem.

It looks like the Catalyst output changes from Java 7 to Java 8. Is the 
ordering supposed to be defined for this or is the ordering really unimportant?

 Some Hive-Catalyst comparison tests fail due to unimportant order of some 
 printed elements
 --

 Key: SPARK-6873
 URL: https://issues.apache.org/jira/browse/SPARK-6873
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 1.3.1
Reporter: Sean Owen
Assignee: Cheng Lian
Priority: Minor

 As I mentioned, I've been seeing 4 test failures in Hive tests for a while, 
 and actually it still affects master. I think it's a superficial problem that 
 only turns up when running on Java 8, but still, would probably be an easy 
 fix and good to fix.
 Specifically, here are four tests and the bit that fails the comparison, 
 below. I tried to diagnose this but had trouble even finding where some of 
 this occurs, like the list of synonyms?
 {code}
 - show_tblproperties *** FAILED ***
   Results do not match for show_tblproperties:
 ...
   !== HIVE - 2 row(s) ==   == CATALYST - 2 row(s) ==
   !tmptruebar bar value
   !barbar value   tmp true (HiveComparisonTest.scala:391)
 {code}
 {code}
 - show_create_table_serde *** FAILED ***
   Results do not match for show_create_table_serde:
 ...
WITH SERDEPROPERTIES (  WITH 
 SERDEPROPERTIES ( 
   !  'serialization.format'='$', 
 'field.delim'=',', 
   !  'field.delim'=',')  
 'serialization.format'='$')
 {code}
 {code}
 - udf_std *** FAILED ***
   Results do not match for udf_std:
 ...
   !== HIVE - 2 row(s) == == CATALYST 
 - 2 row(s) ==
std(x) - Returns the standard deviation of a set of numbers   std(x) - 
 Returns the standard deviation of a set of numbers
   !Synonyms: stddev_pop, stddev  Synonyms: 
 stddev, stddev_pop (HiveComparisonTest.scala:391)
 {code}
 {code}
 - udf_stddev *** FAILED ***
   Results do not match for udf_stddev:
 ...
   !== HIVE - 2 row(s) ==== 
 CATALYST - 2 row(s) ==
stddev(x) - Returns the standard deviation of a set of numbers   stddev(x) 
 - Returns the standard deviation of a set of numbers
   !Synonyms: stddev_pop, stdSynonyms: 
 std, stddev_pop (HiveComparisonTest.scala:391)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] Assigned: (TUSCANY-2041) Repeated nill elements of extended type cause Parser found unknown element exception

2008-02-12 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins reassigned TUSCANY-2041:
-

Assignee: Pete Robbins

 Repeated nill elements of extended type cause Parser found unknown element 
 exception
 --

 Key: TUSCANY-2041
 URL: https://issues.apache.org/jira/browse/TUSCANY-2041
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
Affects Versions: Cpp-Next
 Environment: XP SP2, VC7 
Reporter: Simon Laws
Assignee: Pete Robbins
 Fix For: Cpp-Next


 With the schema
 schema xmlns=http://www.w3.org/2001/XMLSchema; 
 targetNamespace=http://www.example.org/AnnonTypes; 
 xmlns:tns=http://www.example.org/AnnonTypes; 
 elementFormDefault=qualified
 
 element name=Top
   complexType
 sequence
 element name=attribute nillable=true minOccurs=0 
 maxOccurs=unbounded
   complexType
 simpleContent
   extension base=string
 attribute name=name type=string use=required/
   /extension
 /simpleContent
   /complexType
 /element  
 /sequence
   /complexType
 /element 
 /schema
 And XML
 tns:Top xmlns:tns=http://www.example.org/AnnonTypes; 
  xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
  xsi:schemaLocation=http://www.example.org/AnnonTypes 
 AnnonTypes2.xsd 
   
   tns:attribute name=ABC xsi:nil=true 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance/
   !--tns:attribute name=DEF xsi:nil=true 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance/
   tns:attribute name=GHI xsi:nil=true 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance/
   tns:attribute name=JKL xsi:nil=true 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance-- 
 /tns:Top
 When multiple attribute elements are present the following error is reported. 
 SDO_DAS_XML_ParserException Object
 (
 [message:protected] = SDO_DAS_XML::loadFile - Unable to parse the 
 supplied
 xml file
 1 parse error(s) occurred when parsing the file 'AnnonTypes2.xml':
 1. Parser found unknown element attribute
 [string:private] =
 [code:protected] = 0
 [file:protected] = 
 C:\simon\php\workspace\php-branch\phpscripts\chrisdougla
 s\test.php
 [line:protected] = 52
 [trace:private] = Array
 (
 [0] = Array
 (
 [file] = 
 C:\simon\php\workspace\php-branch\phpscripts\chris
 douglas\test.php
 [line] = 52
 [function] = loadFile
 [class] = SDO_DAS_XML
 [type] = -
 [args] = Array
 (
 [0] = AnnonTypes2.xml
 )
 )
 )
 [cause] =
 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (TUSCANY-2041) Repeated nill elements of extended type cause Parser found unknown element exception

2008-02-12 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/TUSCANY-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567999#action_12567999
 ] 

Pete Robbins commented on TUSCANY-2041:
---

I checked in  a fix for this into the branch. If you could test it out, and it 
works ok, I'll create a patch for HEAD as well

 Repeated nill elements of extended type cause Parser found unknown element 
 exception
 --

 Key: TUSCANY-2041
 URL: https://issues.apache.org/jira/browse/TUSCANY-2041
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
Affects Versions: Cpp-Next
 Environment: XP SP2, VC7 
Reporter: Simon Laws
Assignee: Pete Robbins
 Fix For: Cpp-Next


 With the schema
 schema xmlns=http://www.w3.org/2001/XMLSchema; 
 targetNamespace=http://www.example.org/AnnonTypes; 
 xmlns:tns=http://www.example.org/AnnonTypes; 
 elementFormDefault=qualified
 
 element name=Top
   complexType
 sequence
 element name=attribute nillable=true minOccurs=0 
 maxOccurs=unbounded
   complexType
 simpleContent
   extension base=string
 attribute name=name type=string use=required/
   /extension
 /simpleContent
   /complexType
 /element  
 /sequence
   /complexType
 /element 
 /schema
 And XML
 tns:Top xmlns:tns=http://www.example.org/AnnonTypes; 
  xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
  xsi:schemaLocation=http://www.example.org/AnnonTypes 
 AnnonTypes2.xsd 
   
   tns:attribute name=ABC xsi:nil=true 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance/
   !--tns:attribute name=DEF xsi:nil=true 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance/
   tns:attribute name=GHI xsi:nil=true 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance/
   tns:attribute name=JKL xsi:nil=true 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance-- 
 /tns:Top
 When multiple attribute elements are present the following error is reported. 
 SDO_DAS_XML_ParserException Object
 (
 [message:protected] = SDO_DAS_XML::loadFile - Unable to parse the 
 supplied
 xml file
 1 parse error(s) occurred when parsing the file 'AnnonTypes2.xml':
 1. Parser found unknown element attribute
 [string:private] =
 [code:protected] = 0
 [file:protected] = 
 C:\simon\php\workspace\php-branch\phpscripts\chrisdougla
 s\test.php
 [line:protected] = 52
 [trace:private] = Array
 (
 [0] = Array
 (
 [file] = 
 C:\simon\php\workspace\php-branch\phpscripts\chris
 douglas\test.php
 [line] = 52
 [function] = loadFile
 [class] = SDO_DAS_XML
 [type] = -
 [args] = Array
 (
 [0] = AnnonTypes2.xml
 )
 )
 )
 [cause] =
 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (TUSCANY-1529) Tuscany SDO native for windows is not msvc backwards compatible

2007-09-04 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins closed TUSCANY-1529.
-

Resolution: Fixed

 Tuscany SDO native for windows is not msvc backwards compatible 
 

 Key: TUSCANY-1529
 URL: https://issues.apache.org/jira/browse/TUSCANY-1529
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
Affects Versions: Cpp-M3
 Environment: MSVC 8.0 and 7.1
Reporter: Brady Johnson
Priority: Minor
 Fix For: Cpp-Next

 Attachments: tuscany_patch_jira1529


 I've been trying to compile Tuscany on platforms other than VSExpress (which 
 is msvc 8.0) and Linux. 
 I came across something that doesn't compile on msvc7.1 in SDODate.cpp. The 
 problem is the definition
 of localtime for windows. Its #define'd  as localtime_s for windows. A simple 
 check for compiler version
 would allow it to be defined for msvc 8.0 and anything previous, as follows:
 #ifndef tuscany_localtime_r
 #if defined(WIN32)  || defined (_WINDOWS)
   #if _MSC_VER  1400 // _MSC_VER: 1400 is msvc 8.0, so anything less is pre 
 8.0
 #define tuscany_localtime_r(value, ignore) localtime(value);
   #else
 #define tuscany_localtime_r(value, tmp_tm) localtime_s(tmp_tm, value);
   #endif
 #else
   #define tuscany_localtime_r(value, tmp_tm) localtime_r(value, tmp_tm);
 #endif
 #endif // tuscany_localtime_r
 
 Brady Johnson
 Lead Software Developer - HydraSCA
 Rogue Wave Software - [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (TUSCANY-1509) Change TuscanySDO Native build system to use ant

2007-09-04 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins closed TUSCANY-1509.
-

Resolution: Fixed

New Jiras will be raised fr any outstanding problems

 Change TuscanySDO Native build system to use ant
 

 Key: TUSCANY-1509
 URL: https://issues.apache.org/jira/browse/TUSCANY-1509
 Project: Tuscany
  Issue Type: Improvement
  Components: C++ SDO
Affects Versions: Cpp-M3
 Environment: All platforms
Reporter: Brady Johnson
 Fix For: Cpp-Next

 Attachments: build.xml, README_ANT_INSTALL.txt, 
 tuscany_patch_update1_jira1509, tuscanySDONative_ant.jar


 In an effort to simplify the build process, I would like to propose switching 
 over to use ant instead of automake. It will be much easier to maintain, and 
 is used by many more developers today than automake.
 I have already converted most of TuscanySCA to ant. I based this SDO build 
 system on the SCA build system.
 
 Brady Johnson
 Lead Software Developer - HydraSCA
 Rogue Wave Software - [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (TUSCANY-1566) Element coming out in the wrong namespace

2007-08-21 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/TUSCANY-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521534
 ] 

Pete Robbins commented on TUSCANY-1566:
---

This should be fairly easy to fix. I think it is in the logic where we are 
writing a primitive as an element in a sequenced DO.

 Element coming out in the wrong namespace
 -

 Key: TUSCANY-1566
 URL: https://issues.apache.org/jira/browse/TUSCANY-1566
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
Affects Versions: Cpp-Next
 Environment: WinXP
Reporter: Matthew Peters
 Attachments: Atom1.0.xsd


 We have a schema file that defines an atom feed. It specified 
 elementFormDefault=qualified so that lower level elements should be in the 
 target namespace. I will attach the schema as a separate file. With a very 
 simple php test case as follows:
 $xmldas = SDO_DAS_XML::create('Atom1.0.xsd');
 $document = $xmldas-createDocument('http://www.w3.org/2005/Atom','entry');
 $entry = $document-getRootDataObject();
 $author = $entry-createDataObject('author');
 $author-name[] = Caroline Maynard;
 print $xmldas-saveString($document,2);
 we get
 ?xml version=1.0 encoding=UTF-8?
 tns:entry xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
 xmlns:tns=http://www.w3.org/2005/Atom;
   tns:author
 nameCaroline Maynard/name
   /tns:author
 /tns:entry
 whereas we should see the name element in the tns namespace.
 I have checked this with XERCES: the xml that we are generating will not 
 validate, whereas if I alter it to have name in the tns namespace it will. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1566) Element coming out in the wrong namespace

2007-08-21 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1566.
---

   Resolution: Fixed
Fix Version/s: Cpp-Next

Fixed in HEAD and the branch.

 Element coming out in the wrong namespace
 -

 Key: TUSCANY-1566
 URL: https://issues.apache.org/jira/browse/TUSCANY-1566
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
Affects Versions: Cpp-Next
 Environment: WinXP
Reporter: Matthew Peters
 Fix For: Cpp-Next

 Attachments: Atom1.0.xsd


 We have a schema file that defines an atom feed. It specified 
 elementFormDefault=qualified so that lower level elements should be in the 
 target namespace. I will attach the schema as a separate file. With a very 
 simple php test case as follows:
 $xmldas = SDO_DAS_XML::create('Atom1.0.xsd');
 $document = $xmldas-createDocument('http://www.w3.org/2005/Atom','entry');
 $entry = $document-getRootDataObject();
 $author = $entry-createDataObject('author');
 $author-name[] = Caroline Maynard;
 print $xmldas-saveString($document,2);
 we get
 ?xml version=1.0 encoding=UTF-8?
 tns:entry xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
 xmlns:tns=http://www.w3.org/2005/Atom;
   tns:author
 nameCaroline Maynard/name
   /tns:author
 /tns:entry
 whereas we should see the name element in the tns namespace.
 I have checked this with XERCES: the xml that we are generating will not 
 validate, whereas if I alter it to have name in the tns namespace it will. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (TUSCANY-1564) xsi:type not always set for complexTypes

2007-08-21 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/TUSCANY-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521605
 ] 

Pete Robbins commented on TUSCANY-1564:
---

I have applied a patch to the branch only which I believe works. I think that 
we should always write xsi:type information for properties that are of an 
abstract type. In this case the property 'request' is of abstract type 
'requestType' so we must specify what the real type of the property is.

I have tested this and it appears not to break anything else so please can you 
try this out in php and let me know.

 xsi:type not always set for complexTypes
 

 Key: TUSCANY-1564
 URL: https://issues.apache.org/jira/browse/TUSCANY-1564
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
Affects Versions: Cpp-Next
 Environment: Win XP and Gentoo Linux
Reporter: Matthew Peters

 This has been reported by a user of the PHP SDO code. I have verified that he 
 is right - the problem does exist. I will cut and paste in the PHP example 
 from the defect http://pecl.php.net/bugs/bug.php?id=11774 but the php-ness of 
 the example is irrelevant: under the covers we are just manipulating a C++ 
 SDO and then calling XMLHelper-save()
 In the defect text below he puts in both expected and actual output. He is 
 right to raise the problem in the sense that I have tried reading in the 
 actual and expected xml under XERCES with schema validation turned on, and 
 the actual will *not* validate whereas the expected will. 
 Incidentally there is some history w.r.t. xsi:types - in a different case 
 they were coming out when we did not want them and they were suppressed for 
 us. See for example JIRA 1297. I do not know the rules which should determine 
 whether it should be present or not.
 Here follows the original PHP defect.
 Description:
 
 xsi:type is not always set for complexTypes.  Notice the absence of 
 xsi:type=collectionInfo in the actual output.
 Reproduce code:
 ---
 ?xml version=1.0 encoding=UTF-8?
 xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema;
   xsd:element name=request type=requestType/
  xsd:complexType name=requestType abstract=true/
   xsd:complexType name=collectionInfo
  xsd:complexContent
xsd:extension base=requestType
   xsd:sequence minOccurs=0 maxOccurs=unbounded
  xsd:element name=collection/
   /xsd:sequence
   xsd:attribute name=kind type=xsd:string 
 fixed=collectionInfo/
/xsd:extension
   /xsd:complexContent
/xsd:complexType
xsd:element name=request-list
   xsd:complexType
  xsd:sequence
 xsd:element ref=request minOccurs=0 maxOccurs=unbounded/
 /xsd:sequence
  /xsd:complexType
 /xsd:element
 /xsd:schema
 ?php
 try {
   $xmldas = SDO_DAS_XML::create(request.xsd);
   try {
   $doc = $xmldas-createDocument('', 'request-list');
   $rdo = $doc-getRootDataObject();
   $request = $xmldas-createDataObject('', 'collectionInfo');
   $request-collection-insert('Blah');
   $request-kind = 'collectionInfo';
   $rdo-request-insert($request);
   print($xmldas-saveString($doc));
   } catch (SDO_Exception $e) {
   print($e);
   }
 } catch (SDO_Exception $e) {
   print(Problem creating an XML document:  . $e-getMessage());
 }
 ?
 Expected result:
 
 ?xml version=1.0 encoding=UTF-8?
 request-list xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
 request kind=collectionInfo xsi:type=collectionInfo
 collectionBlah/collection
 /request
 /request-list
 Actual result:
 --
 ?xml version=1.0 encoding=UTF-8?
 request-list xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
 request kind=collectionInfo
 collectionBlah/collection
 /request
 /request-list

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (TUSCANY-1504) getSequence() returns null with a complexType defined without mixed=true

2007-08-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/TUSCANY-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517911
 ] 

Pete Robbins commented on TUSCANY-1504:
---

Matthew, the speciifcation is available from here:

http://www.osoa.org/display/Main/Service+Data+Objects+Specifications

 getSequence() returns null with a complexType defined without mixed=true
 --

 Key: TUSCANY-1504
 URL: https://issues.apache.org/jira/browse/TUSCANY-1504
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
Affects Versions: Cpp-M3
Reporter: Matthew Schultz
 Attachments: letter.xsd


 getSequence returns null if complextType does not have the mixed attribute or 
 the mixed attribute is set to false.  
 Looking at the code, SDOSchemaSAX2Parser::startComplexType and
 SDOSchemaSAX2Parser::defineType appears to be the two places that
 isSequenced is set.  In startComplexType, it appears that both mixed and
 sequence are both treated as an attribute.  I cannot tell if it ever
 reads the child sequence.
 It appears that isSequenced should be set to true on the basis of the
 child sequence and not on the basis of mixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1448) CppBigBank example windows deploy script deploy.bat fails to deploy XML Schema

2007-07-27 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1448.
---

Resolution: Fixed

patch applied

 CppBigBank example windows deploy script deploy.bat fails to deploy XML 
 Schema
 

 Key: TUSCANY-1448
 URL: https://issues.apache.org/jira/browse/TUSCANY-1448
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SCA
 Environment: windows
Reporter: Michael Yoder

 The CppBigBank example deploy.bat script for windows fails to copy its XML 
 Schema file to the deploy directory (resulting in its types not being loaded 
 when the sample is executed).
 This can be fixed by adding a schema copy line to the script:
 52a53
  copy %samplebbsrc%\*.xsd %samplebb%

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1423) There are no tools to verify or display tuscany services

2007-07-27 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1423.
---

Resolution: Fixed

I believe this is all applied now

 There are no tools to verify or display tuscany services
 

 Key: TUSCANY-1423
 URL: https://issues.apache.org/jira/browse/TUSCANY-1423
 Project: Tuscany
  Issue Type: New Feature
  Components: C++ SCA
Affects Versions: Cpp-M3
 Environment: All platforms
Reporter: Brady Johnson
Priority: Minor
 Fix For: Cpp-Next

 Attachments: build.xml, main.cpp, tuscanyScaJira1423_update2.tar, 
 TuscanyServiceLoader.cpp, TuscanyServiceLoader.cpp.model, 
 TuscanyServiceLoader.h


 According to Simon Laws thread SCA Toys? posted June 21, 2007, it would be 
 very useful to have a set of utilities or toys for TuscanySCA.
 I have created a toy for TuscanySCA C++ that loads tuscany services via 
 SCARuntime() and displays their information. This would be very
 useful for service development and verification.
 Following is the application's help usage:
 [EMAIL PROTECTED] bin]$ ./tuscanyServiceLoader -h
 Usage
 tuscanyDriver
 -ir Mandatory: Installation root where extensions are located: 
 ${TUSCANY_SCACPP}
 -sr Mandatory: System root where projects are located: 
 ${TUSCANY_SCACPP}/samples
 -sp Optional: System path
 -uri Optional: Base URI
 -dc Optional: Default Component name
 -model Optional: Display SCA Model Hierarchy
 -wsdl Optional: Display WSDL information
 -v Optional: Same as specifying both: -model and -wsdl
 
 Brady Johnson
 Lead Software Developer - HydraSCA
 Rogue Wave Software - [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1425) Compile failure on Fedora 6

2007-07-25 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1425.
---

   Resolution: Fixed
Fix Version/s: Cpp-Next

Compile error fixed in branch and head

Warning fixed in TypeImpl.cpp as per patch

 Compile failure on Fedora 6
 ---

 Key: TUSCANY-1425
 URL: https://issues.apache.org/jira/browse/TUSCANY-1425
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
Affects Versions: Cpp-Next
 Environment: Fedora 6
Reporter: Caroline Maynard
Priority: Critical
 Fix For: Cpp-Next

 Attachments: tuscany-1425.patch


 A PHP user reports a compile failure on Fedora 6. The attached change to 
 SDOSchemaSAX2Parser.hsorts it, and seems benign on Win. 
 FWIW, I also attach a fix for a compiler warning in TypeImpl.cpp.
 The patch is against the pre-2.1 branch, which we're currently using for PHP, 
 but should be applied against the trunk too, please.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1478) For schemas with elementFormDefault=true, serialized instance documents are invalid

2007-07-25 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1478.
---

   Resolution: Fixed
Fix Version/s: Cpp-Next

Patch applied to HEAD and the sdo-cpp-pre2.1 branch

 For schemas with elementFormDefault=true, serialized instance documents are 
 invalid
 ---

 Key: TUSCANY-1478
 URL: https://issues.apache.org/jira/browse/TUSCANY-1478
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
 Environment: all
Reporter: Michael Yoder
 Fix For: Cpp-Next

 Attachments: TUSCANY-1478.txt


 This appears to be a regression in XML serialization. The SCA CppBigBank 
 example is currently failing to get a response from the StockQuote service 
 due to sending an invalid request. 
 Using the XML Schema embedded in StockQuoteService.wsdl, the following code:
 DataFactoryPtr mdg  = DataFactory::getDataFactory();
 XSDHelperPtr xsh = HelperProvider::getXSDHelper(mdg);
 xsh-defineFile(StockQuoteService.wsdl);
 DataObjectPtr doObj = mdg-create(http://swanandmokashi.com;,
   GetQuotes);
 doObj-setCString(QuoteTicker, IBM);
 XMLHelperPtr xmlHelper = HelperProvider::getXMLHelper(mdg);
 XMLDocumentPtr doc = 
   xmlHelper-createDocument(doObj,
 http://swanandmokashi.com;,
 GetQuotes);
 xmlHelper-save(doc, out.xml);
 Will produce the invalid instance document:
 ?xml version=1.0 encoding=UTF-8?
 tns:GetQuotes xmlns:tns=http://swanandmokashi.com; 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;QuoteTickerIBM/QuoteTicker/tns:GetQuotes
 The element QuoteTicker should be namespace qualified.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1440) [SDO Native] Windows compilation issues

2007-07-17 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1440.
---

Resolution: Fixed

fixed

 [SDO Native] Windows compilation issues
 ---

 Key: TUSCANY-1440
 URL: https://issues.apache.org/jira/browse/TUSCANY-1440
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
Affects Versions: Cpp-M3
 Environment: Windows
Reporter: Brady Johnson
Priority: Minor
 Fix For: Cpp-Next


 The commonly used #define int64_t in SDO native may conflict with existing 
 software packages on Windows since it doesnt use #ifndef.
 The define is in: runtime/core/src/commonj/sdo/export.h
 It should be changed as follows:
 #ifndef int64_t
 #define int64_t __int64
 #endif
 I can apply a patch if necessary, but this is a simple change.
 
 Brady Johnson
 Lead Software Developer - HydraSCA
 Rogue Wave Software - [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1439) All classes derived from ReferenceBinding implement getTargetServiceBinding()

2007-07-17 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1439.
---

Resolution: Fixed

Patch applied. One minor change to RubyReferenceBinding required for compile 
error

 All classes derived from ReferenceBinding implement getTargetServiceBinding()
 -

 Key: TUSCANY-1439
 URL: https://issues.apache.org/jira/browse/TUSCANY-1439
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SCA
Affects Versions: Cpp-M3
 Environment: all platforms
Reporter: Brady Johnson
Priority: Minor
 Fix For: Cpp-Next

 Attachments: tuscany_jira1439


 All classes derived from ReferenceBinding implement 
 getTargetServiceBinding(). This method should be moved up to 
 ReferenceBinding. Doing so greatly simplifies walking the SCA Model hierarchy 
 and retrieving the ServiceWrapper.
 Affected sub-classes are:
   CompositeReferenceBinding
   CPPReferenceBinding
   PHPReferenceBinding
   PythonReferenceBinding
   RESTReferenceBinding
   RubyReferenceBinding
   WSReferenceBinding

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (TUSCANY-1423) There are no tools to verify or display tuscany services

2007-07-17 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/TUSCANY-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513219
 ] 

Pete Robbins commented on TUSCANY-1423:
---

I have applied the latest attachment

 There are no tools to verify or display tuscany services
 

 Key: TUSCANY-1423
 URL: https://issues.apache.org/jira/browse/TUSCANY-1423
 Project: Tuscany
  Issue Type: New Feature
  Components: C++ SCA
Affects Versions: Cpp-M3
 Environment: All platforms
Reporter: Brady Johnson
Priority: Minor
 Fix For: Cpp-Next

 Attachments: build.xml, main.cpp, TuscanyServiceLoader.cpp, 
 TuscanyServiceLoader.cpp.model, TuscanyServiceLoader.h


 According to Simon Laws thread SCA Toys? posted June 21, 2007, it would be 
 very useful to have a set of utilities or toys for TuscanySCA.
 I have created a toy for TuscanySCA C++ that loads tuscany services via 
 SCARuntime() and displays their information. This would be very
 useful for service development and verification.
 Following is the application's help usage:
 [EMAIL PROTECTED] bin]$ ./tuscanyServiceLoader -h
 Usage
 tuscanyDriver
 -ir Mandatory: Installation root where extensions are located: 
 ${TUSCANY_SCACPP}
 -sr Mandatory: System root where projects are located: 
 ${TUSCANY_SCACPP}/samples
 -sp Optional: System path
 -uri Optional: Base URI
 -dc Optional: Default Component name
 -model Optional: Display SCA Model Hierarchy
 -wsdl Optional: Display WSDL information
 -v Optional: Same as specifying both: -model and -wsdl
 
 Brady Johnson
 Lead Software Developer - HydraSCA
 Rogue Wave Software - [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1422) Add a method to get component names from a composite

2007-07-12 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1422.
---

   Resolution: Fixed
Fix Version/s: Cpp-Next

patch applied

 Add a method to get component names from a composite
 

 Key: TUSCANY-1422
 URL: https://issues.apache.org/jira/browse/TUSCANY-1422
 Project: Tuscany
  Issue Type: New Feature
  Components: C++ SCA
Affects Versions: Cpp-M3
Reporter: Justin Thomas
 Fix For: Cpp-Next

 Attachments: Composite_jira1422


 While navigating the model heirarchy, it would be helpful to get a list of 
 all components of a given composite.  This way the developer will not need to 
 know the specific component names in order to get them using 
 Composite::findComponent().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1427) TuscanySCA C++ SVN head does not compile: WSDLMessagePart class is missing

2007-07-12 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1427.
---

Resolution: Fixed

I've added the missing files to svn tagged against the original Jira 1402

 TuscanySCA C++ SVN head does not compile: WSDLMessagePart class is missing
 --

 Key: TUSCANY-1427
 URL: https://issues.apache.org/jira/browse/TUSCANY-1427
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SCA
Affects Versions: Cpp-Next
 Environment: All platforms
Reporter: Brady Johnson
Priority: Blocker
 Fix For: Cpp-Next


 I recently submit a patch for a JIRA which involved adding a new class to the 
 model module.
 https://issues.apache.org/jira/browse/TUSCANY-1402
 It seems like the WSDLMessagePart class that was attached to the JIRA was not 
 added to subversion.
 If someone downloads the HEAD for cpp, the class is not there and tuscany 
 doesnt compile.
 
 Brady Johnson
 Lead Software Developer - HydraSCA
 Rogue Wave Software - [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (TUSCANY-1112) Incorrect namespaces in generated XML

2007-07-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/TUSCANY-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510653
 ] 

Pete Robbins commented on TUSCANY-1112:
---

I've fixed this (I hope) in the branch. Please let me know if this works for 
you. I will then apply the fix in HEAD

 Incorrect namespaces in generated XML
 -

 Key: TUSCANY-1112
 URL: https://issues.apache.org/jira/browse/TUSCANY-1112
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SDO
Affects Versions: Cpp-Next
 Environment: WinXP
Reporter: Matthew Peters
 Fix For: Cpp-Next


 Please excuse the fact that I have only a PHP testcase for this. The PHP is 
 however pretty trivial and it seems a simple thing to make in C. Also, I know 
 that the PHP layer is doing very little to interfere, so this is genuine 
 Tuscany behaviour.
 Here is the bug report from the PHP bug tracking system:
 Description:
 
 I have been quite sceptical about the XML that SDO is producing when it
 builds a SOAP request, especially w.r.t. the namespaces. So I tried
 loading the XML that SDO is producing into Java XERCES with validation
 on. There are several problems with the XML generated, I think.
 Using the two xsds that are in the reproduce section below, and the
 short PHP script also there, SDO generates:
 ?xml version=1.0 encoding=UTF-8?
 BOGUS xmlns=http://Component; xmlns:tns=http://Component;
 xmlns:tns2=http://www.test.com/info;
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:type=add
   person
 tns2:name
   firstWill/first
   lastShakespeare/last
 /tns2:name
   /person
 /BOGUS
 There are three (!) things wrong with this.
 1. XERCES will not accept the xsi:type=add. I do not really know why.
 I assume this is because there is no type called add, it's only an
 element. So I do not think this should be coming out. 
 2. name should not be in tns2=http://www.test.com/info, neither should
 first and last be in the default namespace of http://Component. The
 person.xsd has no elementFormDefault, so the elements below person
 should all ne in the no name namespace. 
 3.You have to change the person.xsd to see the third thing: put
 ElementNameDefault=qualified in
 the person schema, then name, first and last should all now be
 coming out in the http://www.test.com/info namespace, but it makes no
 difference to the generated XML. 
 Reproduce code:
 ---
 ?php
 $xmldas = SDO_DAS_XML::create('types.xsd');
 $person =
 $xmldas-createDataObject('http://www.test.com/info','personType');
 $name = $person-createDataObject('name');
 $name-first = Will;
 $name-last  = Shakespeare;
 $add = $xmldas-createDataObject('http://Component','add');
 $add-person = $person;
 $xdoc   = $xmldas-createDocument('', 'BOGUS', $add);
 $xmlstr = $xmldas-saveString($xdoc, 2);
 echo $xmlstr;
 ?
 types.xsd:
 xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema; 
   xmlns:ns0=http://www.test.com/info;
   targetNamespace=http://Component;
   elementNameDefault=qualified
   xs:import schemaLocation=person.xsd
 namespace=http://www.test.com/info/
   xs:element name=add
 xs:complexType
   xs:sequence
 xs:element name=person type=ns0:personType
 nillable=true/
   /xs:sequence
 /xs:complexType
   /xs:element
 /xs:schema
 person.xsd:
 ?xml version=1.0 encoding=UTF-8?
 schema xmlns=http://www.w3.org/2001/XMLSchema; 
 targetNamespace=http://www.test.com/info; 
 xmlns:info=http://www.test.com/info;
 complexType name=nameType
   sequence
   element name=first type=string/element
   element name=last type=string/element
   /sequence
   /complexType
   complexType name=personType
   sequence
   element name=name type=info:nameType/element
   /sequence
   /complexType  
 /schema
 Expected result:
 
 see above
 Actual result:
 --
 see above
 [2007-01-31 12:21 UTC] mfp at php dot net
 I just came across what I think is another example of this. Now I
 understand better how namespaces work, I suspect it is more common than
 we realise. 
 Here's the example in a nutshell:
 Catalog.xsd defines a catalog element in the catalogNS namespace, which
 contains items defined in a different namespace in a different file,
 Order.xsd:
 schema xmlns=http://www.w3.org/2001/XMLSchema;
  xmlns:cat=catalogNS xmlns:ord=orderNS targetNamespace=catalogNS
   include schemaLocation=Order.xsd/
   element name=catalog type=cat:CatalogType/
   complexType name=CatalogType
 sequence
   element maxOccurs=unbounded ref=ord:item/
 /sequence
   /complexType   
 /schema
 Order.xsd defines the item element as being in the OrderNS namespace:
 .../...

[jira] Commented: (TUSCANY-1402) TuscanySCA C++: WSDLOperation class provides no access to the input and output Operation Message information

2007-07-06 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/TUSCANY-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510659
 ] 

Pete Robbins commented on TUSCANY-1402:
---

The WSDLMessagePart.h and .cpp are missing from the patch.

 TuscanySCA C++: WSDLOperation class provides no access to the input and 
 output Operation Message information
 

 Key: TUSCANY-1402
 URL: https://issues.apache.org/jira/browse/TUSCANY-1402
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SCA
Affects Versions: Cpp-M3
 Environment: All platforms
Reporter: Brady Johnson
Priority: Minor
 Fix For: Cpp-Next

 Attachments: tuscany_patch_jira_tuscany1402


 Currently there is no way to obtain WSDL Operation Message information as 
 defined in a WSDL. The WSDLOperation class does have 2 attributes that seem 
 to help serve this purpose, but they are never populated.
 commonj::sdo::DataObjectPtr inputMessage;
 commonj::sdo::DataObjectPtr outputMessage;
 I would like to propose adding several methods related to the input/output 
 messages as follows:
 // Called from WSDLDefinition, populates std::mapstd::string, 
 tuscany::sca::model::WSDLMessagePart partMap
 // which will map part names to message parts
 void WSDLOperation::setInputMessage( commonj::sdo::DataObjectPtr msg );
 void WSDLOperation::setOutputMessage( commonj::sdo::DataObjectPtr msg );
 // Allows you to iterate over the input/output message parts
 // Initially for Document wrapped, there will only be one part
 std::liststd::string WSDLOperation::getInputMessagePartNames();
 std::liststd::string WSDLOperation::getOutputMessagePartNames();
 // Allows you to get the actual input/output message part
 // Initially for Document wrapped, there will only be one part
 tuscany::sca::model::WSDLMessagePart WSDLOperation::getInputMessagePart( 
 std::string msgPartName );
 tuscany::sca::model::WSDLMessagePart WSDLOperation::getOutputMessagePart( 
 std::string msgPartName );
 // Currently WSDLOperation specfies encoding for the entire operation, 
 when actually
 // it should be specified seperately for both the input AND the output 
 message
 void WSDLMessagePart::setInputEncoded( bool ); // replaces setEncoded
 void WSDLMessagePart::setOutputEncoded( bool ); // replaces setEncoded
 bool WSDLMessagePart::isInputEncoded(); // replaces isEncoded
 bool WSDLMessagePart::isOutputEncoded(); // replaces isEncoded
 The WSDLMessagePart class would have the following API:
 WSDLMessagePart::WSDLMessagePart( std::string partName, std::string 
 partType, std::string partUri );
 std::string WSDLMessagePart::getPartName();
 void WSDLMessagePart::setPartName( std::string uri );
 std::string WSDLMessagePart::getPartType();
 void WSDLMessagePart::setPartType( std::string name );
 std::string WSDLMessagePart::getPartUri();
 void WSDLMessagePart::setPartUri( std::string uri );
 This would be a good step towards making Tuscany support Document AND RPC 
 SOAP message binding styles.
 If everyone agrees with these changes, I can provide a patch shortly.
 
 Brady Johnson
 Lead Software Developer - HydraSCA
 Rogue Wave Software - [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1402) TuscanySCA C++: WSDLOperation class provides no access to the input and output Operation Message information

2007-07-06 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1402.
---

Resolution: Fixed

Patch applied and files added

 TuscanySCA C++: WSDLOperation class provides no access to the input and 
 output Operation Message information
 

 Key: TUSCANY-1402
 URL: https://issues.apache.org/jira/browse/TUSCANY-1402
 Project: Tuscany
  Issue Type: Bug
  Components: C++ SCA
Affects Versions: Cpp-M3
 Environment: All platforms
Reporter: Brady Johnson
Priority: Minor
 Fix For: Cpp-Next

 Attachments: tuscany_patch_jira_tuscany1402, WSDLMessagePart.cpp, 
 WSDLMessagePart.h


 Currently there is no way to obtain WSDL Operation Message information as 
 defined in a WSDL. The WSDLOperation class does have 2 attributes that seem 
 to help serve this purpose, but they are never populated.
 commonj::sdo::DataObjectPtr inputMessage;
 commonj::sdo::DataObjectPtr outputMessage;
 I would like to propose adding several methods related to the input/output 
 messages as follows:
 // Called from WSDLDefinition, populates std::mapstd::string, 
 tuscany::sca::model::WSDLMessagePart partMap
 // which will map part names to message parts
 void WSDLOperation::setInputMessage( commonj::sdo::DataObjectPtr msg );
 void WSDLOperation::setOutputMessage( commonj::sdo::DataObjectPtr msg );
 // Allows you to iterate over the input/output message parts
 // Initially for Document wrapped, there will only be one part
 std::liststd::string WSDLOperation::getInputMessagePartNames();
 std::liststd::string WSDLOperation::getOutputMessagePartNames();
 // Allows you to get the actual input/output message part
 // Initially for Document wrapped, there will only be one part
 tuscany::sca::model::WSDLMessagePart WSDLOperation::getInputMessagePart( 
 std::string msgPartName );
 tuscany::sca::model::WSDLMessagePart WSDLOperation::getOutputMessagePart( 
 std::string msgPartName );
 // Currently WSDLOperation specfies encoding for the entire operation, 
 when actually
 // it should be specified seperately for both the input AND the output 
 message
 void WSDLMessagePart::setInputEncoded( bool ); // replaces setEncoded
 void WSDLMessagePart::setOutputEncoded( bool ); // replaces setEncoded
 bool WSDLMessagePart::isInputEncoded(); // replaces isEncoded
 bool WSDLMessagePart::isOutputEncoded(); // replaces isEncoded
 The WSDLMessagePart class would have the following API:
 WSDLMessagePart::WSDLMessagePart( std::string partName, std::string 
 partType, std::string partUri );
 std::string WSDLMessagePart::getPartName();
 void WSDLMessagePart::setPartName( std::string uri );
 std::string WSDLMessagePart::getPartType();
 void WSDLMessagePart::setPartType( std::string name );
 std::string WSDLMessagePart::getPartUri();
 void WSDLMessagePart::setPartUri( std::string uri );
 This would be a good step towards making Tuscany support Document AND RPC 
 SOAP message binding styles.
 If everyone agrees with these changes, I can provide a patch shortly.
 
 Brady Johnson
 Lead Software Developer - HydraSCA
 Rogue Wave Software - [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1386) TuscanySCA C++: The usage of the std::map operator[] in Composite::findWSDLDefinition() could cause problems

2007-06-27 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1386.
---

Resolution: Fixed

patch applied

 TuscanySCA C++: The usage of the std::map operator[] in 
 Composite::findWSDLDefinition() could cause problems
 

 Key: TUSCANY-1386
 URL: https://issues.apache.org/jira/browse/TUSCANY-1386
 Project: Tuscany
  Issue Type: Improvement
  Components: C++ SCA
Affects Versions: Cpp-M3
 Environment: All platforms
Reporter: Brady Johnson
 Fix For: Cpp-Next

 Attachments: Composite_cpp_jira1386


 The std::mapK,V::operator[] will insert a blank object of type V if the key 
 is not found in the map.
 The method Composite::findWSDLDefinition() simply returns 
 wsdlDefinitions[wsdlNamespace]
 which will insert an entry in the map. This could cause problems if the map 
 was previously empty.
 This could also cause problems if type V is not a pointer, which it is in 
 this case.
 
 Brady Johnson
 Lead Software Developer - HydraSCA
 Rogue Wave Software - [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (TUSCANY-1383) Tuscany SCA native/C++ : Ability to query the runtime for its loaded operations

2007-06-27 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/TUSCANY-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins resolved TUSCANY-1383.
---

Resolution: Fixed

patches applied

 Tuscany SCA native/C++ : Ability to query the runtime for its loaded 
 operations
 ---

 Key: TUSCANY-1383
 URL: https://issues.apache.org/jira/browse/TUSCANY-1383
 Project: Tuscany
  Issue Type: Improvement
  Components: C++ SCA
Affects Versions: Cpp-M3
 Environment: API enhancement - all platforms
Reporter: Brady Johnson
Priority: Minor
 Fix For: Cpp-Next

 Attachments: Composite_cpp_jira1383, Composite_h_jira1383, 
 WSDLDefinition_cpp_jira1383, WSDLDefinition_h_jira1383


 Im investigating using TuscanySCA (C++ version) with a container other 
 than Axis. To do this, the container needs to be able to obtain the 
 WSDL operations, types, etc loaded by Tuscany. I would like to propose 
 extending some of the Tuscany APIs to allow them to be queried, since 
 currently you can only do a find with a known operation name. These 
 query operations could be done either by returning an iterator to the
 internal map, or by just returning a list of map's value strings. The
 second option would probably be safer and more thread-safe. 
 Here are the additions that I propose adding:
 runtime/core/src/tuscany/sca/model/Composite.h
std::liststd::string getIncludedComposites(); 
Composite* findIncludedComposite(const std::string compositeName);
std::liststd::string Composite::getWSDLNamespaces();
Change 
  std::vectorComposite* includes;
To 
  std::mapstd::string, Composite includes;
 runtime/core/src/tuscany/sca/model/WSDLDefinition.h
std::liststd::string WSDLDefinition::getPortTypes();
std::liststd::string WSDLDefinition::getOperations( const std::string 
 portTypeName );
 The suggested usage of and rationale behind these additions is as follows:
 Once the projects have been loaded by calling:
tuscany::sca::SCARuntime::initializeSharedRuntime( .. )
 The system composite can then be obtained by calling:
tuscany::sca::model::Composite* SCARuntime::getSystem();
 The system composite doesnt usually contain much other than included 
 composites, so first iterate
 over the composites included in the system composite with these additions to 
 the Composite class:
std::liststd::string compositeList = 
 systemComposite-getIncludedComposites();
tuscany::sca::model::Composite* includedComposite = 
 findIncludedComposite(const std::string compositeName);
 In order to make this easier, the tuscany::sca::model::Composite::includes 
 data member should be changed from a
 vector to a map, which would map from the composite name to the composite. As 
 it is now, it would be necessary to
 return the actual includes vector, which isnt generally a good idea since 
 users could inadvertantly corrupt it.
 For each included composite, this addition to the Composite class would allow 
 you to get all of the 
 WSDL namespaces loaded for a Composite.
std::liststd::string wsdlNSList = includedComposite-getWSDLNamespaces();
 The WSDLDefinition can then be obtained by calling:
WSDLDefinition* wsdlDef = includedComposite-findWSDLDefinition( 
 wsdlNamespace );
 Then, for each WSDLDefinition, you can iterate over all of the WSDL PortTypes 
 and PortType operations
 with these additions to the WSDLDefinition class:
std::liststd::string wsdlPortTypeList = wsdlDef-getPortTypes();
std::liststd::string wsdlPortTypeOpList = wsdlDef-getOperations( 
 portTypeName );
 Now get the WSDLOperation:
WSDLOperation wsdlOp = wsdlDef-findOperation( portTypeName, operName );
 Im not currently a TuscanySCA contributor, so can someone please submit the 
 attached patch?
 Thanks
 
 Brady Johnson
 Rogue Wave Software - [EMAIL PROTECTED]
 Lead Software Developer - HydraSCA

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

1 2 3 4 >

1 - 100 of 324 matches

Mail list logo