[jira] [Commented] (SPARK-20176) Spark Dataframe UDAF issue

2017-04-02 Thread Dinesh Man Amatya (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953025#comment-15953025
 ] 

Dinesh Man Amatya commented on SPARK-20176:
---

Following is the shortened code for generating above error. There are three 
files 
Test.scala , TestUdaf.scala and testData.csv




#Test.scala


import org.apache.spark.SparkContext
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types._
import org.scalatest.{BeforeAndAfterEach, FunSuite}

/**
  * Created by damatya on 4/3/17.
  */
class Test  extends FunSuite with BeforeAndAfterEach {

  var sparkSession : SparkSession = _
  var sc :SparkContext= _


  override def beforeEach() {
sparkSession = SparkSession.builder().appName("udf testings")
  .master("local")
  .config("", "")
  .getOrCreate()
sc = sparkSession.sparkContext
  }

  override def afterEach() {
sparkSession.stop()
  }

  test("test total")
  {
val sqlContext = sparkSession.sqlContext

val dataRdd = 
sc.textFile("/opt/projects/pa/DasBackend/SparkEngine/src/main/resources/testData.csv")

val schemaString = "memberId;paidAmt;allowedAmt"

val schema = StructType(schemaString.split(";").map(fieldName ⇒ 
StructField(fieldName, StringType, true)))

val rowRdd = dataRdd.map{line => line.split(";", -1)}.map{ array => 
Row.fromSeq(array.toSeq)}

val dataFrame = sqlContext.createDataFrame(rowRdd,schema)

val testUdaf:TestUdaf = new TestUdaf(schema)

val resultDataFrame = 
dataFrame.groupBy("memberId").agg(testUdaf(dataFrame.columns.map(dataFrame(_)):_*).as("totalAmountPair"))

resultDataFrame.show(false)

dataFrame.show()



  }


}







##TestUdaf.scala


import org.apache.spark.sql.Row
import org.apache.spark.sql.expressions.{MutableAggregationBuffer, 
UserDefinedAggregateFunction}
import org.apache.spark.sql.types.{DataType, DoubleType, StructType}


/**
  * Created by damatya on 4/3/17.
  */
class TestUdaf (inputSch:StructType) extends UserDefinedAggregateFunction{




  override def inputSchema: StructType = inputSch

  //inputSchema = inputSch

  override def bufferSchema: StructType = new StructType()
.add("totalRxPaid",DoubleType)
.add("totalRxAllowedAmt",DoubleType)




  override def dataType: DataType = new StructType()
.add("totalRxPaid",DoubleType)
.add("totalRxAllowedAmt",DoubleType)


  override def deterministic: Boolean = false

  override def initialize(buffer: MutableAggregationBuffer): Unit = {

buffer.update(0,0D)
buffer.update(1,0D)

  }

  override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {


var paidAmount : Float = 0f
var allowedAmount : Float = 0f

try
{
  paidAmount=input.getFloat(1)
  allowedAmount=input.getFloat(2)
}
catch
  {
case e:Exception =>
println ("invalid amount")
  }

val totalPaidAmount = buffer.getDouble(0)+paidAmount
val totalAllowedAmount = buffer.getDouble(1)+allowedAmount

buffer.update(0,totalPaidAmount)
buffer.update(1,totalAllowedAmount)

  }

  override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = {
buffer1.update(0,buffer1.getDouble(0)+buffer2.getDouble(0))
buffer1.update(1,buffer1.getDouble(1)+buffer2.getDouble(1))
  }

  override def evaluate(buffer: Row): Any = {

(buffer.getDouble(0) , buffer.getDouble(1))
  }
}





#testData.csv

m123;10.5;1
m123;20;10
m11;10;1
m11;30;1

> Spark Dataframe UDAF issue
> --
>
> Key: SPARK-20176
> URL: https://issues.apache.org/jira/browse/SPARK-20176
> Project: Spark
>  Issue Type: IT Help
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Dinesh Man Amatya
>
> Getting following error in custom UDAF
> Error while decoding: java.util.concurrent.ExecutionException: 
> java.lang.Exception: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 58, Column 33: Incompatible expression types "boolean" and "java.lang.Boolean"
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificSafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificSafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private MutableRow mutableRow;
> /* 009 */   private Object[] values;
> /* 010 */   private Object[] values1;
> /* 011 */   private org.apache.spark.sql.types.StructType schema;
> /* 012 */   private org.apache.spark.sql.types.StructType schema1;
> /* 013 */
> /* 014 */
> /* 015 */   public SpecificSafeProjection(Object[] references) {
> /* 016 */ this.references = references;
> /* 017 */ mutableRow = (MutableRow) references[references.length - 1];
> /* 018 */
> 

[jira] [Commented] (SPARK-20188) Catalog recoverPartitions should allow specifying the database name

2017-04-02 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952997#comment-15952997
 ] 

Xiao Li commented on SPARK-20188:
-

The changes are many. I might submit the one tomorrow. Found a bug in the 
catalog API, let me fix it at first.

> Catalog recoverPartitions should allow specifying the database name
> ---
>
> Key: SPARK-20188
> URL: https://issues.apache.org/jira/browse/SPARK-20188
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>
> Currently Catalog.recoverParitions only has a tableName parameter
> https://github.com/apache/spark/blob/9effc2cdcb3d68db8b6b5b3abd75968633b583c8/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L397
> But it throws an exception when the table is not in the default database.
> Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table 
> or view 'foo' not found in database 'default';
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:154)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:317)
>   at 
> org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand.run(ddl.scala:563)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20197) CRAN check fail with package installation

2017-04-02 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-20197.
--
Resolution: Fixed

> CRAN check fail with package installation 
> --
>
> Key: SPARK-20197
> URL: https://issues.apache.org/jira/browse/SPARK-20197
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
>  Failed 
> -
>   1. Failure: No extra files are created in SPARK_HOME by starting session 
> and making calls (@test_sparkSQL.R#2858)
>   length(sparkRFilesBefore) > 0 isn't true.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20197) CRAN check fail with package installation

2017-04-02 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952948#comment-15952948
 ] 

Felix Cheung commented on SPARK-20197:
--

merged https://github.com/apache/spark/pull/17515 to branch-2.1
will need to follow up on master separately.


> CRAN check fail with package installation 
> --
>
> Key: SPARK-20197
> URL: https://issues.apache.org/jira/browse/SPARK-20197
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
>  Failed 
> -
>   1. Failure: No extra files are created in SPARK_HOME by starting session 
> and making calls (@test_sparkSQL.R#2858)
>   length(sparkRFilesBefore) > 0 isn't true.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20188) Catalog recoverPartitions should allow specifying the database name

2017-04-02 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952905#comment-15952905
 ] 

Xiao Li commented on SPARK-20188:
-

Yes! In different Catalog APIs, we have different input formats. We need to 
rename them and provide clear comments. Will submit a PR tonight.

> Catalog recoverPartitions should allow specifying the database name
> ---
>
> Key: SPARK-20188
> URL: https://issues.apache.org/jira/browse/SPARK-20188
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>
> Currently Catalog.recoverParitions only has a tableName parameter
> https://github.com/apache/spark/blob/9effc2cdcb3d68db8b6b5b3abd75968633b583c8/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L397
> But it throws an exception when the table is not in the default database.
> Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table 
> or view 'foo' not found in database 'default';
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:154)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:317)
>   at 
> org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand.run(ddl.scala:563)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18570) Consider supporting other R formula operators

2017-04-02 Thread Krishna Kalyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952887#comment-15952887
 ] 

Krishna Kalyan commented on SPARK-18570:


[~felixcheung] I currently not working on this. Sorry for the inconvenience. 

> Consider supporting other R formula operators
> -
>
> Key: SPARK-18570
> URL: https://issues.apache.org/jira/browse/SPARK-18570
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SparkR
>Reporter: Felix Cheung
>Priority: Minor
>
> Such as
> {code}
> ∗ 
>  X∗Y include these variables and the interactions between them
> ^
>  (X + Z + W)^3 include these variables and all interactions up to three way
> |
>  X | Z conditioning: include x given z
> {code}
> Other includes, %in%, ` (backtick)
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20197) CRAN check fail with package installation

2017-04-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952885#comment-15952885
 ] 

Apache Spark commented on SPARK-20197:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/17516

> CRAN check fail with package installation 
> --
>
> Key: SPARK-20197
> URL: https://issues.apache.org/jira/browse/SPARK-20197
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
>  Failed 
> -
>   1. Failure: No extra files are created in SPARK_HOME by starting session 
> and making calls (@test_sparkSQL.R#2858)
>   length(sparkRFilesBefore) > 0 isn't true.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20197) CRAN check fail with package installation

2017-04-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952884#comment-15952884
 ] 

Apache Spark commented on SPARK-20197:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/17515

> CRAN check fail with package installation 
> --
>
> Key: SPARK-20197
> URL: https://issues.apache.org/jira/browse/SPARK-20197
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
>  Failed 
> -
>   1. Failure: No extra files are created in SPARK_HOME by starting session 
> and making calls (@test_sparkSQL.R#2858)
>   length(sparkRFilesBefore) > 0 isn't true.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20197) CRAN check fail with package installation

2017-04-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952879#comment-15952879
 ] 

Apache Spark commented on SPARK-20197:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/17514

> CRAN check fail with package installation 
> --
>
> Key: SPARK-20197
> URL: https://issues.apache.org/jira/browse/SPARK-20197
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
>  Failed 
> -
>   1. Failure: No extra files are created in SPARK_HOME by starting session 
> and making calls (@test_sparkSQL.R#2858)
>   length(sparkRFilesBefore) > 0 isn't true.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20197) CRAN check fail with package installation

2017-04-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20197:


Assignee: Felix Cheung  (was: Apache Spark)

> CRAN check fail with package installation 
> --
>
> Key: SPARK-20197
> URL: https://issues.apache.org/jira/browse/SPARK-20197
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
>  Failed 
> -
>   1. Failure: No extra files are created in SPARK_HOME by starting session 
> and making calls (@test_sparkSQL.R#2858)
>   length(sparkRFilesBefore) > 0 isn't true.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20197) CRAN check fail with package installation

2017-04-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952878#comment-15952878
 ] 

Apache Spark commented on SPARK-20197:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/17513

> CRAN check fail with package installation 
> --
>
> Key: SPARK-20197
> URL: https://issues.apache.org/jira/browse/SPARK-20197
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
>  Failed 
> -
>   1. Failure: No extra files are created in SPARK_HOME by starting session 
> and making calls (@test_sparkSQL.R#2858)
>   length(sparkRFilesBefore) > 0 isn't true.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20197) CRAN check fail with package installation

2017-04-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20197:


Assignee: Apache Spark  (was: Felix Cheung)

> CRAN check fail with package installation 
> --
>
> Key: SPARK-20197
> URL: https://issues.apache.org/jira/browse/SPARK-20197
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Felix Cheung
>Assignee: Apache Spark
>
>  Failed 
> -
>   1. Failure: No extra files are created in SPARK_HOME by starting session 
> and making calls (@test_sparkSQL.R#2858)
>   length(sparkRFilesBefore) > 0 isn't true.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20197) CRAN check fail with package installation

2017-04-02 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-20197:
-
Description: 
 Failed 
-
  1. Failure: No extra files are created in SPARK_HOME by starting session and 
making calls (@test_sparkSQL.R#2858)
  length(sparkRFilesBefore) > 0 isn't true.



> CRAN check fail with package installation 
> --
>
> Key: SPARK-20197
> URL: https://issues.apache.org/jira/browse/SPARK-20197
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
>  Failed 
> -
>   1. Failure: No extra files are created in SPARK_HOME by starting session 
> and making calls (@test_sparkSQL.R#2858)
>   length(sparkRFilesBefore) > 0 isn't true.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20197) CRAN check fail with package installation

2017-04-02 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-20197:


 Summary: CRAN check fail with package installation 
 Key: SPARK-20197
 URL: https://issues.apache.org/jira/browse/SPARK-20197
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.1.0, 2.1.1
Reporter: Felix Cheung
Assignee: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20180) Unlimited max pattern length in Prefix span

2017-04-02 Thread Cyril de Vogelaere (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952828#comment-15952828
 ] 

Cyril de Vogelaere commented on SPARK-20180:


I'm not arguing for no max at all. Just for a special value (0) which allow a 
user to find all pattern of any length.
Let's give a practical example, let's say i'm not the brightest user. I have a 
really big dataset I need to analyse, but because I'm not too bright, I don't 
run any analysis on that dataset, to know what the max possible length of a 
sequence would be. I look at the parameters and I tell myself that a maxlength 
of like 50 will be enough. I code quickly and launch it using spark's algo.

I wait a few day for the result, since it was a really big dataset, and I see 
that I have a few solution pattern of length 50. And there is the problem, now 
that I'm there, do I need to re-run everything because there was a larger 
pattern that I wanted to find, or was 50 really the limit ?

This may waste a lot of time for some people if that happens to them. So I want 
to create that special value (0) for maxpattern length. So that when the 
algorithm ends, you get all the patterns of any length, no doubt possible.

Now honestly, while I want to say that this change may be usefull, a carefull 
user can always set the maxpattern length at the max Integer value and, given 
the time the algorithm needs to run on a small dataset, he will never have any 
problem since it would take him months to get a pattern longer than that, even 
with the biggest dataset we can imagine and lot's of computing power. So this 
is not a mandatory feature, simply something I feel would be nice to have. 
Also, in case there really was someone crazy enough to run this algorithm with 
for goal to find pattern longer than the max for integer, that special value 
would allow him to find them for sure.

I also would like to advocate for setting that special value (0) as the default 
value, since the first time I ran a test was on a dataset that returned long 
pattern (kosarak or protein, I don't remember which of the two) and thanks to 
the default pattern length it finished super quickly and I tought I had a 
performance improvement, in comparison to another algorithm that allowed 
unllimited pattern length as default.

Took me two day to realise the default parameter was set to ten (guess I was 
the dumb user there. Not my proudest moment, I will admit). So I want to 
advocate for that 0 value as default, but I get for backward compatibility it 
may not be best to change the default behavior. So I want a senior's opinion, 
to know if changing that would be ok.


Again, we use pull requests to propose changes.
=> I know, (I read the contributor guide), but I'm waiting for the test to 
finish. I tested my code and had some error, which I'm pretty sure were 
unrelated to the few line of code I added. I'm relaunching it though, on the 
code without my changes, if the error are the same (And you feel like the 
changes I'm proposing are worth it) I will make the pull request. That way, I'm 
sure I don't waste anybody's time on reviewing code nobody (but the little 
inexperienced newbie I am) would feel usefull. 


So there we go, I hope that answer your questions.
If not drop me another message, I will answer as best as I can. :)


> Unlimited max pattern length in Prefix span
> ---
>
> Key: SPARK-20180
> URL: https://issues.apache.org/jira/browse/SPARK-20180
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Right now, we need to use .setMaxPatternLength() method to
> specify is the maximum pattern length of a sequence. Any pattern longer than 
> that won't be outputted.
> The current default maxPatternlength value being 10.
> This should be changed so that with input 0, all pattern of any length would 
> be outputted. Additionally, the default value should be changed to 0, so that 
> a new user could find all patterns in his dataset without looking at this 
> parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20196) Python to add catalog API for refreshByPath

2017-04-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20196:


Assignee: Apache Spark  (was: Felix Cheung)

> Python to add catalog API for refreshByPath
> ---
>
> Key: SPARK-20196
> URL: https://issues.apache.org/jira/browse/SPARK-20196
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20196) Python to add catalog API for refreshByPath

2017-04-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952823#comment-15952823
 ] 

Apache Spark commented on SPARK-20196:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/17512

> Python to add catalog API for refreshByPath
> ---
>
> Key: SPARK-20196
> URL: https://issues.apache.org/jira/browse/SPARK-20196
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20196) Python to add catalog API for refreshByPath

2017-04-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20196:


Assignee: Felix Cheung  (was: Apache Spark)

> Python to add catalog API for refreshByPath
> ---
>
> Key: SPARK-20196
> URL: https://issues.apache.org/jira/browse/SPARK-20196
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20196) Python to add catalog API for refreshByPath

2017-04-02 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-20196:


 Summary: Python to add catalog API for refreshByPath
 Key: SPARK-20196
 URL: https://issues.apache.org/jira/browse/SPARK-20196
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 2.2.0
Reporter: Felix Cheung
Assignee: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20195) SparkR to add createTable catalog API and deprecate createExternalTable

2017-04-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20195:


Assignee: Felix Cheung  (was: Apache Spark)

> SparkR to add createTable catalog API and deprecate createExternalTable
> ---
>
> Key: SPARK-20195
> URL: https://issues.apache.org/jira/browse/SPARK-20195
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, SQL
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> Only naming differences for clarity, functionality is already supported with 
> and without path parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20195) SparkR to add createTable catalog API and deprecate createExternalTable

2017-04-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952815#comment-15952815
 ] 

Apache Spark commented on SPARK-20195:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/17511

> SparkR to add createTable catalog API and deprecate createExternalTable
> ---
>
> Key: SPARK-20195
> URL: https://issues.apache.org/jira/browse/SPARK-20195
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, SQL
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> Only naming differences for clarity, functionality is already supported with 
> and without path parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20195) SparkR to add createTable catalog API and deprecate createExternalTable

2017-04-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20195:


Assignee: Apache Spark  (was: Felix Cheung)

> SparkR to add createTable catalog API and deprecate createExternalTable
> ---
>
> Key: SPARK-20195
> URL: https://issues.apache.org/jira/browse/SPARK-20195
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, SQL
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Apache Spark
>
> Only naming differences for clarity, functionality is already supported with 
> and without path parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20195) SparkR to add createTable catalog API and deprecate createExternalTable

2017-04-02 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-20195:


 Summary: SparkR to add createTable catalog API and deprecate 
createExternalTable
 Key: SPARK-20195
 URL: https://issues.apache.org/jira/browse/SPARK-20195
 Project: Spark
  Issue Type: Bug
  Components: SparkR, SQL
Affects Versions: 2.2.0
Reporter: Felix Cheung
Assignee: Felix Cheung


Only naming differences for clarity, functionality is already supported with 
and without path parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20159) Support complete Catalog API in R

2017-04-02 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-20159.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

> Support complete Catalog API in R
> -
>
> Key: SPARK-20159
> URL: https://issues.apache.org/jira/browse/SPARK-20159
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.2.0
>
>
> As an user, I'd like to have access to catalog API to manage and view 
> databases, tables, functions and metadata



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18570) Consider supporting other R formula operators

2017-04-02 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952808#comment-15952808
 ] 

Felix Cheung commented on SPARK-18570:
--

[~KrishnaKalyan3]are you working on this?

> Consider supporting other R formula operators
> -
>
> Key: SPARK-18570
> URL: https://issues.apache.org/jira/browse/SPARK-18570
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SparkR
>Reporter: Felix Cheung
>Priority: Minor
>
> Such as
> {code}
> ∗ 
>  X∗Y include these variables and the interactions between them
> ^
>  (X + Z + W)^3 include these variables and all interactions up to three way
> |
>  X | Z conditioning: include x given z
> {code}
> Other includes, %in%, ` (backtick)
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18822) Support ML Pipeline in SparkR

2017-04-02 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952806#comment-15952806
 ] 

Felix Cheung commented on SPARK-18822:
--

right, sorry didn't get around to work on this...

> Support ML Pipeline in SparkR
> -
>
> Key: SPARK-18822
> URL: https://issues.apache.org/jira/browse/SPARK-18822
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Felix Cheung
>
> From Joseph Bradley:
> "
> Supporting Pipelines and advanced use cases: There really needs to be more 
> design discussion around SparkR. Felix Cheung would you be interested in 
> leading some discussion? I'm envisioning something similar to what was done a 
> while back for Pipelines in Scala/Java/Python, where we consider several use 
> cases of MLlib: fitting a single model, creating and tuning a complex 
> Pipeline, and working with multiple languages. That should help inform what 
> APIs should look like in Spark R.
> "
> Certain ML model, such as OneVsRest, is harder to represent in a single call 
> R API. Having advanced API or Pipeline API like this could help to expose 
> that to our users.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20180) Unlimited max pattern length in Prefix span

2017-04-02 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952768#comment-15952768
 ] 

Sean Owen commented on SPARK-20180:
---

It sounds like you're arguing for no max at all -- what do you mean? that 
doesn't sound right.

It's possible that you didn't quite run the tests correctly, or that a test is 
flaky. I wouldn't worry about that, it's not clear this is a change to make 
anyway.
Again, we use pull requests to propose changes.

> Unlimited max pattern length in Prefix span
> ---
>
> Key: SPARK-20180
> URL: https://issues.apache.org/jira/browse/SPARK-20180
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Right now, we need to use .setMaxPatternLength() method to
> specify is the maximum pattern length of a sequence. Any pattern longer than 
> that won't be outputted.
> The current default maxPatternlength value being 10.
> This should be changed so that with input 0, all pattern of any length would 
> be outputted. Additionally, the default value should be changed to 0, so that 
> a new user could find all patterns in his dataset without looking at this 
> parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20180) Unlimited max pattern length in Prefix span

2017-04-02 Thread Cyril de Vogelaere (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952618#comment-15952618
 ] 

Cyril de Vogelaere edited comment on SPARK-20180 at 4/2/17 4:50 PM:


Yes they can, it's really not a critical issue at all.
Current pattern length work also well for most in practice, except for very 
large datasets where sequence are very long. But then I suppose people
would know about the parameter, and set it to a large value.

However allowing unlimitted pattern length would cost nothing in terms of 
performance, it's just an additionnal condition in an if. And may be easier 
than always setting the highest value possible. At least, that option wouldn't 
hurt and there was a TODO for that in the code. I think it would be good, even 
if we don't change the default value of 10. Changing it make more sense to me, 
but I get that, to allow backward compatibility, we can't just change things as 
we want. So I will follow my senior's opinion on this.


Actually, I have quite a few improvement in store for Prefix-span since I 
worked on an algorithm for my master thesis. Notably a very performant 
implementation that specialize PrefixSpan for single-item pattern, while 
slightly improving the performance of multi-item pattern. But I was told I 
needed to get familiar with contributing to spark first ^^', thus why I'm 
proposing this small, non critical, improvement, and implementing it.

I'm ready to push this small change anytime, it's already implemented. But the 
contributor wiki ask to run dev/run-tests before pushing, and it's been running 
for a day and a half already ... Is that normal by the way ? Also, the test 
already found some error, but I'm 99.999% sure they're not mine. They're not 
even from the mllib module, which is the only thing I modified  ... Is that 
normal too ? I suppose so, but I wouldn't want to waste the reviewers time ^^'


was (Author: syrux):
Yes they can, it's really not a critical issue at all.
Current pattern length work also well for most in practice, except for very 
large datasets where sequence are very long. But then I suppose people
would know about the parameter, and set it to a large value.

However changing it to create a default value allowing unlimitted pattern 
length would cost nothing in terms of performance, it's just an additionnal 
condition in an if. And may be easier than always setting the highest value 
possible. At least, that option wouldn't hurt and there was a TODO for that in 
the code.

Actually, I have quite a few improvement in store for Prefix-span since I 
worked on an algorithm for my master thesis. Notably a very performant 
implementation that specialize PrefixSpan for single-item pattern, while 
slightly improving the performance of multi-item pattern. But I was told I 
needed to get familiar with contributing to spark first ^^', thus why I'm 
proposing this small, non critical, improvement, and implementing it.

I'm ready to push this small change anytime, it's already implemented. But the 
contributor wiki ask to run dev/run-tests before pushing, and it's been running 
for a day and a half already ... Is that normal by the way ? Also, the test 
already found some error, but I'm 99.999% sure they're not mine. They're not 
even from the mllib module, which is the only thing I modified  ... Is that 
normal too ? I suppose so, but I wouldn't want to waste the reviewers time ^^'

> Unlimited max pattern length in Prefix span
> ---
>
> Key: SPARK-20180
> URL: https://issues.apache.org/jira/browse/SPARK-20180
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Right now, we need to use .setMaxPatternLength() method to
> specify is the maximum pattern length of a sequence. Any pattern longer than 
> that won't be outputted.
> The current default maxPatternlength value being 10.
> This should be changed so that with input 0, all pattern of any length would 
> be outputted. Additionally, the default value should be changed to 0, so that 
> a new user could find all patterns in his dataset without looking at this 
> parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9414) HiveContext:saveAsTable creates wrong partition for existing hive table(append mode)

2017-04-02 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-9414.
-
Resolution: Cannot Reproduce

I am resolving this per the comment above. Please reopen this if I 
misunderstood and this issue still exists.

> HiveContext:saveAsTable creates wrong partition for existing hive 
> table(append mode)
> 
>
> Key: SPARK-9414
> URL: https://issues.apache.org/jira/browse/SPARK-9414
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Hadoop 2.6, Spark 1.4.0, Hive 0.14.0.
>Reporter: Chetan Dalal
>Priority: Critical
>
> Raising this bug because I found this issue was ready reported on Apache mail 
> archive and I am facing a similar issue.
> ---original--
> I am using spark 1.4 and HiveContext to append data into a partitioned
> hive table. I found that the data insert into the table is correct, but the
> partition(folder) created is totally wrong.
> {code}
>  val schemaString = "zone z year month date hh x y height u v w ph phb 
> p pb qvapor qgraup qnice qnrain tke_pbl el_pbl"
> val schema =
>   StructType(
> schemaString.split(" ").map(fieldName =>
>   if (fieldName.equals("zone") || fieldName.equals("z") ||
> fieldName.equals("year") || fieldName.equals("month") ||
>   fieldName.equals("date") || fieldName.equals("hh") ||
> fieldName.equals("x") || fieldName.equals("y"))
> StructField(fieldName, IntegerType, true)
>   else
> StructField(fieldName, FloatType, true)
> ))
> val pairVarRDD =
> sc.parallelize(Seq((Row(2,42,2009,3,1,0,218,365,9989.497.floatValue(),29.627113.floatValue(),19.071793.floatValue(),0.11982734.floatValue(),3174.6812.floatValue(),
> 97735.2.floatValue(),16.389032.floatValue(),-96.62891.floatValue(),25135.365.floatValue(),2.6476808E-5.floatValue(),0.0.floatValue(),13195.351.floatValue(),
> 0.0.floatValue(),0.1.floatValue(),0.0.floatValue()))
> ))
> val partitionedTestDF2 = sqlContext.createDataFrame(pairVarRDD, schema)
> partitionedTestDF2.write.format("org.apache.spark.sql.hive.orc.DefaultSource")
> .mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("test4DimBySpark")
> {code}
> -
> The table contains 23 columns (longer than Tuple maximum length), so I
> use Row Object to store raw data, not Tuple.
> Here is some message from spark when it saved data>>
> {code}
> 
> 15/06/16 10:39:22 INFO metadata.Hive: Renaming
> src:hdfs://service-10-0.local:8020/tmp/hive-patcharee/hive_2015-06-16_10-39-21_205_8768669104487548472-1/-ext-1/zone=13195/z=0/year=0/month=0/part-1;dest:
> hdfs://service-10-0.local:8020/apps/hive/warehouse/test4dimBySpark/zone=13195/z=0/year=0/month=0/part-1;Status:true
> 
> 15/06/16 10:39:22 INFO metadata.Hive: New loading path =
> hdfs://service-10-0.local:8020/tmp/hive-patcharee/hive_2015-06-16_10-39-21_205_8768669104487548472-1/-ext-1/zone=13195/z=0/year=0/month=0
> with partSpec {zone=13195, z=0, year=0, month=0}
> 
> From the raw data (pairVarRDD) zone = 2, z = 42, year = 2009, month =
> 3. But spark created a partition {zone=13195, z=0, year=0, month=0}. (x)
> 
> When I queried from hive>>
> 
> hive> select * from test4dimBySpark;
> OK
> 242200931.00.0218.0365.09989.497
> 29.62711319.0717930.11982734-3174.681297735.2 16.389032
> -96.6289125135.3652.6476808E-50.0 13195000
> hive> select zone, z, year, month from test4dimBySpark;
> OK
> 13195000
> hive> dfs -ls /apps/hive/warehouse/test4dimBySpark/*/*/*/*;
> Found 2 items
> -rw-r--r--   3 patcharee hdfs   1411 2015-06-16 10:39
> /apps/hive/warehouse/test4dimBySpark/zone=13195/z=0/year=0/month=0/part-1
> 
> The data stored in the table is correct zone = 2, z = 42, year = 2009,
> month = 3, but the partition created was wrong
> "zone=13195/z=0/year=0/month=0" (x)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19532) [Core]`DataStreamer for file` threads of DFSOutputStream leak if set `spark.speculation` to true

2017-04-02 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952730#comment-15952730
 ] 

Hyukjin Kwon commented on SPARK-19532:
--

It sounds no one except for the reporter could reproduce this. If you are going 
to fix it, that's great but if not, I think it might be better to resolve this 
for now. Anyone can reopen this with more details and symptoms if anyone meets 
this issue.

I guess this will probably be open without more actions.

> [Core]`DataStreamer for file` threads of DFSOutputStream leak if set 
> `spark.speculation` to true
> 
>
> Key: SPARK-19532
> URL: https://issues.apache.org/jira/browse/SPARK-19532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.1.0
>Reporter: StanZhai
>Priority: Critical
>
> When set `spark.speculation` to true, from thread dump page of Executor of 
> WebUI, I found that there are about 1300 threads named  "DataStreamer for 
> file 
> /test/data/test_temp/_temporary/0/_temporary/attempt_20170207172435_80750_m_69_1/part-00069-690407af-0900-46b1-9590-a6d6c696fe68.snappy.parquet"
>  in TIMED_WAITING state.
> {code}
> java.lang.Object.wait(Native Method)
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:564)
> {code}
> The off-heap memory exceeds a lot until Executor exited with OOM exception. 
> This problem occurs only when writing data to the Hadoop(tasks may be killed 
> by Executor during writing).
> Could this be related to [https://issues.apache.org/jira/browse/HDFS-9812]? 
> The version of Hadoop is 2.6.4.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20157) In the menu ‘Storage’in Web UI, click the Go button, and shows no paging menu interface.

2017-04-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20157.
---
Resolution: Won't Fix

> In the menu ‘Storage’in Web UI, click the Go button, and shows no paging menu 
> interface.
> 
>
> Key: SPARK-20157
> URL: https://issues.apache.org/jira/browse/SPARK-20157
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: guoxiaolongzte
>Priority: Minor
> Attachments: After the change.png, Before the change1.png, Before the 
> change2.png
>
>
> Choose 'show' text box, fill in the data to show a number greater than or 
> equal to the data to the total number of article. Click on the "Go" button, 
> display interface display the total number of the data, but the page menu 
> disappear, cause I want to continue to choose "show" text box and fill in the 
> article to show the data number, can only leave the interface, select 
> specific link click again come in to look at it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20194) Support partition pruning for InMemoryCatalog

2017-04-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20194:


Assignee: (was: Apache Spark)

> Support partition pruning for InMemoryCatalog
> -
>
> Key: SPARK-20194
> URL: https://issues.apache.org/jira/browse/SPARK-20194
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.1.0
>Reporter: Adrian Ionescu
>
> {{listPartitionsByFilter()}} is not yet implemented for {{InMemoryCatalog}}:
> {quote}
>  // TODO: Provide an implementation
> throw new UnsupportedOperationException(
>   "listPartitionsByFilter is not implemented for InMemoryCatalog")
> {quote}
> Because of this, there is a hack in {{FindDataSourceTable}} that avoids 
> passing along the {{CatalogTable}} to the {{DataSource}} it creates when the 
> catalog implementation is not "hive", so that, when the latter is resolved, 
> an {{InMemoryFileIndex}} is created instead of a {{CatalogFileIndex}} which 
> the {{PruneFileSourcePartitions}} rule matches for.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20194) Support partition pruning for InMemoryCatalog

2017-04-02 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20194:


Assignee: Apache Spark

> Support partition pruning for InMemoryCatalog
> -
>
> Key: SPARK-20194
> URL: https://issues.apache.org/jira/browse/SPARK-20194
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.1.0
>Reporter: Adrian Ionescu
>Assignee: Apache Spark
>
> {{listPartitionsByFilter()}} is not yet implemented for {{InMemoryCatalog}}:
> {quote}
>  // TODO: Provide an implementation
> throw new UnsupportedOperationException(
>   "listPartitionsByFilter is not implemented for InMemoryCatalog")
> {quote}
> Because of this, there is a hack in {{FindDataSourceTable}} that avoids 
> passing along the {{CatalogTable}} to the {{DataSource}} it creates when the 
> catalog implementation is not "hive", so that, when the latter is resolved, 
> an {{InMemoryFileIndex}} is created instead of a {{CatalogFileIndex}} which 
> the {{PruneFileSourcePartitions}} rule matches for.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20194) Support partition pruning for InMemoryCatalog

2017-04-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952710#comment-15952710
 ] 

Apache Spark commented on SPARK-20194:
--

User 'adrian-ionescu' has created a pull request for this issue:
https://github.com/apache/spark/pull/17510

> Support partition pruning for InMemoryCatalog
> -
>
> Key: SPARK-20194
> URL: https://issues.apache.org/jira/browse/SPARK-20194
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.1.0
>Reporter: Adrian Ionescu
>
> {{listPartitionsByFilter()}} is not yet implemented for {{InMemoryCatalog}}:
> {quote}
>  // TODO: Provide an implementation
> throw new UnsupportedOperationException(
>   "listPartitionsByFilter is not implemented for InMemoryCatalog")
> {quote}
> Because of this, there is a hack in {{FindDataSourceTable}} that avoids 
> passing along the {{CatalogTable}} to the {{DataSource}} it creates when the 
> catalog implementation is not "hive", so that, when the latter is resolved, 
> an {{InMemoryFileIndex}} is created instead of a {{CatalogFileIndex}} which 
> the {{PruneFileSourcePartitions}} rule matches for.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20135) spark thriftserver2: no job running but containers not release on yarn

2017-04-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20135.
---
Resolution: Invalid

> spark thriftserver2: no job running but containers not release on yarn
> --
>
> Key: SPARK-20135
> URL: https://issues.apache.org/jira/browse/SPARK-20135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
> Environment: spark 2.0.1 with hadoop 2.6.0 
>Reporter: bruce xu
> Attachments: 0329-1.png, 0329-2.png, 0329-3.png
>
>
> i opened the executor dynamic allocation feature, however it doesn't work 
> sometimes.
> i set the initial executor num 50,  after job finished the cores and mem 
> resource did not release. 
> from the spark web UI, the active job/running task/stage num is 0 , but the 
> executors page show  cores 1276, active task 7288.
> from the yarn web UI,  the thriftserver job's running containers is 639 
> without releasing. 
> this may be a bug. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20141) jdbc query gives ORA-00903

2017-04-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20141.
---
Resolution: Not A Problem

> jdbc query gives ORA-00903
> --
>
> Key: SPARK-20141
> URL: https://issues.apache.org/jira/browse/SPARK-20141
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.0.2
> Environment: Windows7
>Reporter: sergio
>  Labels: windows
> Attachments: exception.png
>
>
> Error while querying to external oracle database. 
> It works this way and then I can work with jdbcDF:
> val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:oracle:thin:@//crmdbmr.cgs.comp.ru:1521/crmrmir",
>   "user" -> "my_login",
>   "password" -> "my_password",
>   "dbtable" -> "siebel.table1")).load() 
> while when trying to send some query, it fails 
> val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:oracle:thin:@//crmdbmr.cgs.comp.ru:1521/crmrmir",
>   "user" -> "my_login",
>   "password" -> "my_password",
>   "dbtable" -> "select * from siebel.table1 where call_id= 
> '1-1TMC4D4U'")).load() 
> This query works fine in SQLDeveloper, or when i registerTempTable, but when 
> I put direct query instead of schema.table, it gives this error:
> java.sql.SQLSyntaxErrorException: ORA-00903:
> It looks like spark sends wrong query.
> I tried everything in "JDBC To Other Databases":
> http://spark.apache.org/docs/latest/sql-programming-guide.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20173) Throw NullPointerException when HiveThriftServer2 is shutdown

2017-04-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-20173:
-

Assignee: zuotingbing
Priority: Minor  (was: Major)

> Throw NullPointerException when HiveThriftServer2 is shutdown
> -
>
> Key: SPARK-20173
> URL: https://issues.apache.org/jira/browse/SPARK-20173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: zuotingbing
>Assignee: zuotingbing
>Priority: Minor
> Fix For: 2.2.0
>
>
> Throw NullPointerException when HiveThriftServer2 is shutdown:
> 
> 2017-03-30 11:52:56,355 ERROR Utils: Uncaught exception in thread Thread-2
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$$anonfun$main$1.apply$mcV$sp(HiveThriftServer2.scala:85)
>   at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:215)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:187)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:187)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:187)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1953)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:187)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:187)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:187)
>   at scala.util.Try$.apply(Try.scala:192)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:187)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:177)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> 2017-03-30 11:52:56,357 INFO ShutdownHookManager: Shutdown hook called



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20173) Throw NullPointerException when HiveThriftServer2 is shutdown

2017-04-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20173.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 17496
[https://github.com/apache/spark/pull/17496]

> Throw NullPointerException when HiveThriftServer2 is shutdown
> -
>
> Key: SPARK-20173
> URL: https://issues.apache.org/jira/browse/SPARK-20173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: zuotingbing
> Fix For: 2.2.0
>
>
> Throw NullPointerException when HiveThriftServer2 is shutdown:
> 
> 2017-03-30 11:52:56,355 ERROR Utils: Uncaught exception in thread Thread-2
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$$anonfun$main$1.apply$mcV$sp(HiveThriftServer2.scala:85)
>   at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:215)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:187)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:187)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:187)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1953)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:187)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:187)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:187)
>   at scala.util.Try$.apply(Try.scala:192)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:187)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:177)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> 2017-03-30 11:52:56,357 INFO ShutdownHookManager: Shutdown hook called



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19999) Test failures in Spark Core due to java.nio.Bits.unaligned()

2017-04-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-1:
--
Fix Version/s: 2.1.1

> Test failures in Spark Core due to java.nio.Bits.unaligned()
> 
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: Ubuntu 14.04 ppc64le 
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sonia Garudi
>Assignee: Sonia Garudi
>Priority: Minor
>  Labels: ppc64le
> Fix For: 2.1.1, 2.2.0
>
> Attachments: Core.patch
>
>
> There are multiple test failures seen in Spark Core project with the 
> following error message :
> {code:borderStyle=solid}
> java.lang.IllegalArgumentException: requirement failed: No support for 
> unaligned Unsafe. Set spark.memory.offHeap.enabled to false.
> {code}
> These errors occur due to java.nio.Bits.unaligned(), which does not return 
> true for the ppc64le arch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20123) $SPARK_HOME variable might have spaces in it(e.g. $SPARK_HOME=/home/spark build/spark), then build spark failed.

2017-04-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-20123:
-

Assignee: zuotingbing

> $SPARK_HOME variable might have spaces in it(e.g. $SPARK_HOME=/home/spark 
> build/spark), then build spark failed.
> 
>
> Key: SPARK-20123
> URL: https://issues.apache.org/jira/browse/SPARK-20123
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.1.0
>Reporter: zuotingbing
>Assignee: zuotingbing
>Priority: Minor
> Fix For: 2.2.0
>
>
> If $SPARK_HOME or $FWDIR variable contains spaces, then use 
> "./dev/make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.7 
> -Phive -Phive-thriftserver -Pmesos -Pyarn" build spark will failed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20123) $SPARK_HOME variable might have spaces in it(e.g. $SPARK_HOME=/home/spark build/spark), then build spark failed.

2017-04-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20123.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 17452
[https://github.com/apache/spark/pull/17452]

> $SPARK_HOME variable might have spaces in it(e.g. $SPARK_HOME=/home/spark 
> build/spark), then build spark failed.
> 
>
> Key: SPARK-20123
> URL: https://issues.apache.org/jira/browse/SPARK-20123
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.1.0
>Reporter: zuotingbing
>Priority: Minor
> Fix For: 2.2.0
>
>
> If $SPARK_HOME or $FWDIR variable contains spaces, then use 
> "./dev/make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.7 
> -Phive -Phive-thriftserver -Pmesos -Pyarn" build spark will failed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20143) DataType.fromJson should throw an exception with better message

2017-04-02 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20143.
-
   Resolution: Fixed
 Assignee: Hyukjin Kwon
Fix Version/s: 2.2.0

> DataType.fromJson should throw an exception with better message
> ---
>
> Key: SPARK-20143
> URL: https://issues.apache.org/jira/browse/SPARK-20143
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.2.0
>
>
> Currently, 
> {code}
> scala> import org.apache.spark.sql.types.DataType
> import org.apache.spark.sql.types.DataType
> scala> DataType.fromJson( abcd)
> java.util.NoSuchElementException: key not found: abcd
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at scala.collection.AbstractMap.default(Map.scala:59)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at scala.collection.AbstractMap.apply(Map.scala:59)
>   at org.apache.spark.sql.types.DataType$.nameToType(DataType.scala:118)
>   at org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:132)
>   at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:104)
>   ... 48 elided
> scala> DataType.fromJson( """{"abcd":"a"}""")
> scala.MatchError: JObject(List((abcd,JString(a (of class 
> org.json4s.JsonAST$JObject)
>   at org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:130)
>   at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:104)
>   ... 48 elided
> scala> DataType.fromJson( """{"fields": [{"a":123}], "type": "struct"}""")
> scala.MatchError: JObject(List((a,JInt(123 (of class 
> org.json4s.JsonAST$JObject)
>   at 
> org.apache.spark.sql.types.DataType$.org$apache$spark$sql$types$DataType$$parseStructField(DataType.scala:169)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:150)
>   at 
> org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:150)
>   at scala.collection.immutable.List.map(List.scala:273)
>   at org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:150)
>   at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:104)
>   ... 48 elided
> {code}
> {{DataType.fromJson}} throws non-readable error messages for the json input. 
> We could improve this rather than throwing {{scala.MatchError}} or 
> {{java.util.NoSuchElementException}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20194) Support partition pruning for InMemoryCatalog

2017-04-02 Thread Adrian Ionescu (JIRA)
Adrian Ionescu created SPARK-20194:
--

 Summary: Support partition pruning for InMemoryCatalog
 Key: SPARK-20194
 URL: https://issues.apache.org/jira/browse/SPARK-20194
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: 2.1.0
Reporter: Adrian Ionescu


{{listPartitionsByFilter()}} is not yet implemented for {{InMemoryCatalog}}:
{quote}
 // TODO: Provide an implementation
throw new UnsupportedOperationException(
  "listPartitionsByFilter is not implemented for InMemoryCatalog")
{quote}

Because of this, there is a hack in {{FindDataSourceTable}} that avoids passing 
along the {{CatalogTable}} to the {{DataSource}} it creates when the catalog 
implementation is not "hive", so that, when the latter is resolved, an 
{{InMemoryFileIndex}} is created instead of a {{CatalogFileIndex}} which the 
{{PruneFileSourcePartitions}} rule matches for.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20193) Selecting empty struct causes ExpressionEncoder error.

2017-04-02 Thread Adrian Ionescu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Ionescu updated SPARK-20193:
---
Description: 
{{def struct(cols: Column*): Column}}
Given the above signature and the lack of any note in the docs saying that a 
struct with no columns is not supported, I would expect the following to work:
{{spark.range(3).select(col("id"), struct().as("empty_struct")).collect}}

However, this results in:
{quote}
java.lang.AssertionError: assertion failed: each serializer expression should 
contains at least one `BoundReference`
  at scala.Predef$.assert(Predef.scala:170)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:240)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:238)
  at 
org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2837)
  at org.apache.spark.sql.Dataset.select(Dataset.scala:1131)
  ... 39 elided
{quote}

  was:
{{def struct(cols: Column*): Column}}
Given the above signature and the lack of any note in the docs that a struct 
with no columns is not supported, I would expect the following to work:
{{spark.range(3).select(col("id"), struct().as("empty_struct")).collect}}

However, this results in:
{quote}
java.lang.AssertionError: assertion failed: each serializer expression should 
contains at least one `BoundReference`
  at scala.Predef$.assert(Predef.scala:170)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:240)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:238)
  at 
org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2837)
  at org.apache.spark.sql.Dataset.select(Dataset.scala:1131)
  ... 39 elided
{quote}


> Selecting empty struct causes ExpressionEncoder error.
> --
>
> Key: SPARK-20193
> URL: https://issues.apache.org/jira/browse/SPARK-20193
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Adrian Ionescu
>  Labels: struct
>
> {{def struct(cols: Column*): Column}}
> Given the above signature and the lack of any note in the docs saying that a 
> struct with no columns is not supported, I would expect the following to work:
> {{spark.range(3).select(col("id"), struct().as("empty_struct")).collect}}
> However, this results in:
> {quote}
> java.lang.AssertionError: assertion failed: each serializer expression should 
> contains at least one `BoundReference`
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:240)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:238)
>   at 
> org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>   at 
> 

[jira] [Created] (SPARK-20193) Selecting empty struct causes ExpressionEncoder error.

2017-04-02 Thread Adrian Ionescu (JIRA)
Adrian Ionescu created SPARK-20193:
--

 Summary: Selecting empty struct causes ExpressionEncoder error.
 Key: SPARK-20193
 URL: https://issues.apache.org/jira/browse/SPARK-20193
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
Reporter: Adrian Ionescu


{{def struct(cols: Column*): Column}}
Given the above signature and the lack of any note in the docs that a struct 
with no columns is not supported, I would expect the following to work:
{{spark.range(3).select(col("id"), struct().as("empty_struct")).collect}}

However, this results in:
{quote}
java.lang.AssertionError: assertion failed: each serializer expression should 
contains at least one `BoundReference`
  at scala.Predef$.assert(Predef.scala:170)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:240)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.(ExpressionEncoder.scala:238)
  at 
org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2837)
  at org.apache.spark.sql.Dataset.select(Dataset.scala:1131)
  ... 39 elided
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20180) Unlimited max pattern length in Prefix span

2017-04-02 Thread Cyril de Vogelaere (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952618#comment-15952618
 ] 

Cyril de Vogelaere edited comment on SPARK-20180 at 4/2/17 10:39 AM:
-

Yes they can, it's really not a critical issue at all.
Current pattern length work also well for most in practice, except for very 
large datasets where sequence are very long. But then I suppose people
would know about the parameter, and set it to a large value.

However changing it to create a default value allowing unlimitted pattern 
length would cost nothing in terms of performance, it's just an additionnal 
condition in an if. And may be easier than always setting the highest value 
possible. At least, that option wouldn't hurt and there was a TODO for that in 
the code.

Actually, I have quite a few improvement in store for Prefix-span since I 
worked on an algorithm for my master thesis. Notably a very performant 
implementation that specialize PrefixSpan for single-item pattern, while 
slightly improving the performance of multi-item pattern. But I was told I 
needed to get familiar with contributing to spark first ^^', thus why I'm 
proposing this small, non critical, improvement, and implementing it.

I'm ready to push this small change anytime, it's already implemented. But the 
contributor wiki ask to run dev/run-tests before pushing, and it's been running 
for a day and a half already ... Is that normal by the way ? Also, the test 
already found some error, but I'm 99.999% sure they're not mine. They're not 
even from the mllib module, which is the only thing I modified  ... Is that 
normal too ? I suppose so, but I wouldn't want to waste the reviewers time ^^'


was (Author: syrux):
Yes they can, it's really not a critical issue at all.
Current pattern length work also well for most in practice, except for very 
large datasets where sequence are very long. But then I suppose people
would know about the parameter, and set it to a large value.

However changing it to create a default value allowing unlimitted pattern 
length would cost nothing in terms of performance, it's just an additionnal 
condition in an if. And may be easier than always setting the highest value 
possible. At least, that option wouldn't hurt.

Actually, I have quite a few improvement in store for Prefix-span since I 
worked on an algorithm for my master thesis. Notably a very performant 
implementation that specialize PrefixSpan for single-item pattern, while 
slightly improving the performance of multi-item pattern. But I was told I 
needed to get familiar with contributing to spark first ^^', thus why I'm 
proposing this small, non critical, improvement, and implementing it.

I'm ready to push this small change anytime, it's already implemented. But the 
contributor wiki ask to run dev/run-tests before pushing, and it's been running 
for a day and a half already ... Is that normal by the way ? Also, the test 
already found some error, but I'm 99.999% sure they're not mine. They're not 
even from the mllib module, which is the only thing I modified  ... Is that 
normal too ? I suppose so, but I wouldn't want to waste the reviewers time ^^'

> Unlimited max pattern length in Prefix span
> ---
>
> Key: SPARK-20180
> URL: https://issues.apache.org/jira/browse/SPARK-20180
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Right now, we need to use .setMaxPatternLength() method to
> specify is the maximum pattern length of a sequence. Any pattern longer than 
> that won't be outputted.
> The current default maxPatternlength value being 10.
> This should be changed so that with input 0, all pattern of any length would 
> be outputted. Additionally, the default value should be changed to 0, so that 
> a new user could find all patterns in his dataset without looking at this 
> parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20180) Unlimited max pattern length in Prefix span

2017-04-02 Thread Cyril de Vogelaere (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952618#comment-15952618
 ] 

Cyril de Vogelaere commented on SPARK-20180:


Yes they can, it's really not a critical issue at all.
Current pattern length work also well for most in practice, except for very 
large datasets where sequence are very long. But then I suppose people
would know about the parameter, and set it to a large value.

However changing it to create a default value allowing unlimitted pattern 
length would cost nothing in terms of performance, it's just an additionnal 
condition in an if. And may be easier than always setting the highest value 
possible. At least, that option wouldn't hurt.

Actually, I have quite a few improvement in store for Prefix-span since I 
worked on an algorithm for my master thesis. Notably a very performant 
implementation that specialize PrefixSpan for single-item pattern, while 
slightly improving the performance of multi-item pattern. But I was told I 
needed to get familiar with contributing to spark first ^^', thus why I'm 
proposing this small, non critical, improvement, and implementing it.

I'm ready to push this small change anytime, it's already implemented. But the 
contributor wiki ask to run dev/run-tests before pushing, and it's been running 
for a day and a half already ... Is that normal by the way ? Also, the test 
already found some error, but I'm 99.999% sure they're not mine. They're not 
even from the mllib module, which is the only thing I modified  ... Is that 
normal too ? I suppose so, but I wouldn't want to waste the reviewers time ^^'

> Unlimited max pattern length in Prefix span
> ---
>
> Key: SPARK-20180
> URL: https://issues.apache.org/jira/browse/SPARK-20180
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Right now, we need to use .setMaxPatternLength() method to
> specify is the maximum pattern length of a sequence. Any pattern longer than 
> that won't be outputted.
> The current default maxPatternlength value being 10.
> This should be changed so that with input 0, all pattern of any length would 
> be outputted. Additionally, the default value should be changed to 0, so that 
> a new user could find all patterns in his dataset without looking at this 
> parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20190) '/applications/[app-id]/jobs' in rest api,status should be [running|succeeded|failed|unknown]

2017-04-02 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-20190:
-

Assignee: guoxiaolongzte
Priority: Trivial  (was: Minor)

> '/applications/[app-id]/jobs' in rest api,status should be 
> [running|succeeded|failed|unknown]
> -
>
> Key: SPARK-20190
> URL: https://issues.apache.org/jira/browse/SPARK-20190
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.1.0
>Reporter: guoxiaolongzte
>Assignee: guoxiaolongzte
>Priority: Trivial
>
> '/applications/[app-id]/jobs' in rest api.status should 
> be'[running|succeeded|failed|unknown]'.
> now status is '[complete|succeeded|failed]'.
> but '/applications/[app-id]/jobs?status=complete' the server return 'HTTP 
> ERROR 404'.
> Added '?status=running' and '?status=unknown'.
> code :
> public enum JobExecutionStatus {
>   RUNNING,
>   SUCCEEDED,
>   FAILED,
>   UNKNOWN;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19999) Test failures in Spark Core due to java.nio.Bits.unaligned()

2017-04-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952568#comment-15952568
 ] 

Apache Spark commented on SPARK-1:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/17509

> Test failures in Spark Core due to java.nio.Bits.unaligned()
> 
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: Ubuntu 14.04 ppc64le 
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sonia Garudi
>Assignee: Sonia Garudi
>Priority: Minor
>  Labels: ppc64le
> Fix For: 2.2.0
>
> Attachments: Core.patch
>
>
> There are multiple test failures seen in Spark Core project with the 
> following error message :
> {code:borderStyle=solid}
> java.lang.IllegalArgumentException: requirement failed: No support for 
> unaligned Unsafe. Set spark.memory.offHeap.enabled to false.
> {code}
> These errors occur due to java.nio.Bits.unaligned(), which does not return 
> true for the ppc64le arch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org