[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2018-04-06 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
This was superceded by #19643


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2018-04-06 Thread mariusvniekerk
Github user mariusvniekerk closed the pull request at:

https://github.com/apache/spark/pull/15666


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

2017-11-06 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19643#discussion_r149085525
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+#' Adds a JAR dependency for Spark tasks to be executed in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
driver.
--- End diff --

maybe something like `underlying/backing java process` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-06-17 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r122578282
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
--- End diff --

In that case do we want to bother having this method for R?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-06-17 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r122578275
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2017-06-17 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
@HyukjinKwon Any hints what's needed to get the R stuff passing?  I don't 
really have a windows testbed that I can use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-06-13 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/16766
  
Let me rebase this.  I don't currently have a clean way of testing this on 
Windows 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-18 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r106781948
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
+  sc <- getSparkContext()
+  normalizedPath <- suppressWarnings(normalizePath(path))
+  scala_sc <- callJMethod(sc, "sc")
+  invisible(callJMethod(scala_sc, "addJar", normalizedPath, 
addToCurrentClassLoader))
--- End diff --

why is normalizepath doing that to the url?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-08 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104908469
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
--- End diff --

Mostly for backwards compatibility. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104486665
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple 
transforms", {
   sparkR.session.stop()
 })
 
+test_that("add jar should work and allow usage of the jar on the driver 
node", {
+  sparkR.sparkContext()
+
+  destDir <- paste0(tempdir(), "/", "testjar")
+  jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar",
+  destDir, "sparkrTests", "DummyClassForAddJarTest")
+
+  spark.addJar(jarName, addToCurrentClassLoader = TRUE)
+  testClass <- newJObject("sparkrTests.DummyClassForAddJarTest")
--- End diff --

yeah i suspect that the windows path didn't make it properly into the 
classloader


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2017-03-06 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
Ah thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2017-03-03 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
whoops.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2017-03-01 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
Seems to be something in pyspark.SparkContext.addJar:10: ERROR: Unexpected 
indentation. ?

what exactly does it want in that docstring?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2017-03-01 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
@holdenk done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2017-02-28 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
I'll see if I can rebase it tomorrow 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-02-08 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r100177099
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1802,19 +1802,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
`SparkContext` in the future.
* @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported filesystems),
* an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker 
node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
--- End diff --

Add to doc that already loaded urls will have no effect if a url is already 
present.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-02-08 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r100176188
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1802,19 +1802,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
`SparkContext` in the future.
* @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported filesystems),
* an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker 
node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
 if (path == null) {
   logWarning("null specified as parameter to addJar")
 } else {
   var key = ""
-  if (path.contains("\\")) {
+
+  val uri = if (path.contains("\\")) {
 // For local paths with backslashes on Windows, URI throws an 
exception
-key = env.rpcEnv.fileServer.addJar(new File(path))
+new File(path).toURI
   } else {
 val uri = new URI(path)
 // SPARK-17650: Make sure this is a valid URL before adding it to 
the list of dependencies
 Utils.validateURL(uri)
+uri
+  }
+
+  if (path.contains("\\")) {
+// For local paths with backslashes on Windows, URI throws an 
exception
+key = env.rpcEnv.fileServer.addJar(new File(uri))
--- End diff --

If we have backslashes we are in a local path on windows.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-02-05 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15821
  
Probably a good thing to look at is the R pieces since that is effectively 
constrained to InternalRow


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-03 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/16766
  
@felixcheung This does not touch any of the coalesce internals.  Only 
allows setting a partitionCoalescer  similar to what is already available in 
rdd.coalesce


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-03 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99369813
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -497,7 +496,9 @@ case class UnionExec(children: Seq[SparkPlan]) extends 
SparkPlan {
  * if you go from 1000 partitions to 100 partitions, there will not be a 
shuffle, instead each of
  * the 100 new partitions will claim 10 of the current partitions.
  */
-case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends 
UnaryExecNode {
+case class CoalesceExec(numPartitions: Int, child: SparkPlan,
+partitionCoalescer: Option[PartitionCoalescer]
+   ) extends UnaryExecNode {
--- End diff --

Do you guys have a .scalafmt.conf that applies all of this?  that should 
make things cleaner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-03 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99366754
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -117,6 +134,34 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   data: _*)
   }
 
+  test("coalesce, custom") {
+
+val maxSplitSize = 512
+// Similar to the implementation of `test("custom RDD coalescer")` 
from [[RDDSuite]] we first
+// write out to disk, to ensure that our splits are in fact 
[[FileSplit]] instances.
+val data = (1 to 1000).map(i => ClassData(i.toString, i))
+data.toDS().repartition(10).write.format("csv").save(path.toString)
+
+val ds = spark.read.format("csv").load(path.toString).as[ClassData]
--- End diff --

Oh right csv doesn't do headers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-03 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99366143
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -17,24 +17,41 @@
 
 package org.apache.spark.sql
 
-import java.io.{Externalizable, ObjectInput, ObjectOutput}
+import java.io.{Externalizable, File, ObjectInput, ObjectOutput}
 import java.sql.{Date, Timestamp}
 
+import org.apache.hadoop.mapred.FileSplit
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.rdd.{CoalescedRDDPartition, HadoopPartition, 
SizeBasedCoalescer}
 import org.apache.spark.sql.catalyst.encoders.{OuterScopes, RowEncoder}
 import org.apache.spark.sql.catalyst.util.sideBySide
-import org.apache.spark.sql.execution.{LogicalRDD, RDDScanExec, SortExec}
+import org.apache.spark.sql.execution.{LogicalRDD, RDDScanExec}
 import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, 
ShuffleExchange}
 import org.apache.spark.sql.execution.streaming.MemoryStream
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.test.SharedSQLContext
 import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
 
 case class TestDataPoint(x: Int, y: Double, s: String, t: TestDataPoint2)
 case class TestDataPoint2(x: Int, s: String)
 
-class DatasetSuite extends QueryTest with SharedSQLContext {
+class DatasetSuite extends QueryTest with SharedSQLContext with 
BeforeAndAfter {
   import testImplicits._
 
+  private var path: File = null
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+path = Utils.createTempDir()
+path.delete()
+  }
+
+  after {
+Utils.deleteRecursively(path)
+  }
--- End diff --

ah thanks.  I looked at the writer tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-03 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99363149
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -823,6 +825,17 @@ case class Repartition(numPartitions: Int, shuffle: 
Boolean, child: LogicalPlan)
 }
 
 /**
+ * Returns a new RDD that has exactly `numPartitions` partitions.
+ */
+case class CoalesceLogical(numPartitions: Int, partitionCoalescer: 
Option[PartitionCoalescer],
--- End diff --

that sounds good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99132600
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -823,6 +825,17 @@ case class Repartition(numPartitions: Int, shuffle: 
Boolean, child: LogicalPlan)
 }
 
 /**
+  * Returns a new RDD that has exactly `numPartitions` partitions.
+  */
+case class CoalesceLogical(numPartitions: Int, partitionCoalescer: 
Option[PartitionCoalescer],
--- End diff --

Main reason is there was already a Coalesce expression class


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2017-02-01 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
Yeah I'll be there 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2017-02-01 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
@holdenk Anything i can do from my side to help this guy along?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-01 Thread mariusvniekerk
GitHub user mariusvniekerk opened a pull request:

https://github.com/apache/spark/pull/16766

[SPARK-19426][SQL] Custom coalesce for Dataset

## What changes were proposed in this pull request?

This adds support for using the PartitionCoalescer features added in #11865 
(SPARK-14042) to the Dataset API

## How was this patch tested?

Manual tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mariusvniekerk/spark wip_customCoalesce

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16766.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16766


commit 15b34dd88f81b20c1be1ef42e6b647d42ef5f462
Author: Marius van Niekerk 
Date:   2016-11-07T22:06:38Z

custom coalesce




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-22 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r93730314
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -164,6 +164,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
--- End diff --

The R tests do indeed verify that they can call the internal functions.

I can revert that part of the changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-22 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r93729928
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -164,6 +164,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
--- End diff --

Yeah when i wrote this that didn't exist yet.  Changing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2016-12-16 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
Rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-10 Thread mariusvniekerk
Github user mariusvniekerk closed the pull request at:

https://github.com/apache/spark/pull/15666


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-10 Thread mariusvniekerk
GitHub user mariusvniekerk reopened a pull request:

https://github.com/apache/spark/pull/15666

[SPARK-11421] [Core][Python][R] Added ability for addJar to augment the 
current classloader

## What changes were proposed in this pull request?

Adds a flag to sc.addJar to add the jar to the current classloader
## How was this patch tested?

Unit tests, manual tests

This is a continuation of the pull request in 
https://github.com/apache/spark/pull/9313 and is mostly a rebase of that moved 
to master with SparkR additions.

cc @holdenk 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mariusvniekerk/spark SPARK-11421

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15666


commit 6fb5d66e7669ebe0e8a515e02b1276e1bab652a2
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:17Z

Squashed content from pull request #9313

commit 6a6e98a0fcc7f388009f36b8a31664bda2ccf5d9
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:29Z

Remove _loadClass method since we dont need it anymore under py4j 0.10

commit 2b1e98e50feb7180b94f7b9e304634566f163718
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:36Z

Expose addJar to sparkR as well

commit 7f37d3a060d574bd6c38539ec896fbc4c94060f3
Author: mariusvniekerk 
Date:   2016-10-29T13:24:40Z

Style fixes

commit 9d838b35b53b1e4fdcf39721b4f638ead9e40fcd
Author: Marius van Niekerk 
Date:   2016-10-29T20:15:32Z

Adjust test suite to test add jar in scala as well.

commit d4416d92610affd363701fd08dc53eb720566130
Author: Marius van Niekerk 
Date:   2016-10-29T21:19:16Z

Fixed scala test not working due to incorrect classloader being used.

commit fccb141dd9e6d36db242997f1c6f3e007caa514f
Author: Marius van Niekerk 
Date:   2016-10-30T00:15:27Z

Fixed typo with test.

commit 26b39de51f9a76b121ebcb70079072dfcc9972bd
Author: Marius van Niekerk 
Date:   2016-11-01T01:46:07Z

Fixed documentation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2016-11-29 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15821
  
So this is very cool stuff.  

Would it be reasonable to add some api pieces so that on the python side 
things like DataFrame.mapPartitions makes use of Apache Arrow to lower the 
serialization costs?  Or is that more a follow-on piece of work


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2016-11-01 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/15666
  
@HyukjinKwon there seems to be something weird with the appveyor checks?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85865112
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
 if (path == null) {
   logWarning("null specified as parameter to addJar")
 } else {
   var key = ""
-  if (path.contains("\\")) {
+
+  val uri = if (path.contains("\\")) {
 // For local paths with backslashes on Windows, URI throws an 
exception
-key = env.rpcEnv.fileServer.addJar(new File(path))
--- End diff --

So this change gets the URI for the windows URI which is used later on to 
construct a File instance.  That should allow the windows special case to work. 
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85833766
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
--- End diff --

Keeping it in the Scala makes it simpler for other spark Scala interpeters 
(eg toree, zeppelin) to make use of this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-27 Thread mariusvniekerk
GitHub user mariusvniekerk opened a pull request:

https://github.com/apache/spark/pull/15666

[SPARK-11421] [Core][Python][R] Added ability for addJar to augment the 
current classloader

## What changes were proposed in this pull request?

Adds a flag to sc.addJar to add the jar to the current classloader

## How was this patch tested?

Unit tests, manual tests

This is a continuation of the pull request in 
https://github.com/apache/spark/pull/9313 and is mostly a rebase of that moved 
to master with SparkR additions.

cc @holdenk 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mariusvniekerk/spark SPARK-11421

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15666


commit 6fb5d66e7669ebe0e8a515e02b1276e1bab652a2
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:17Z

Squashed content from pull request #9313

commit 6a6e98a0fcc7f388009f36b8a31664bda2ccf5d9
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:29Z

Remove _loadClass method since we dont need it anymore under py4j 0.10

commit 2b1e98e50feb7180b94f7b9e304634566f163718
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:36Z

Expose addJar to sparkR as well




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9313: [SPARK-10658][SPARK-11421][PYSPARK][CORE] Provide add jar...

2016-10-27 Thread mariusvniekerk
Github user mariusvniekerk commented on the issue:

https://github.com/apache/spark/pull/9313
  
So since py4j now uses the context classloader, we can remove the python 
pieces about loading a class by name.  

@holdenk If you want I can revisit this PR.   

This case occurs for me specifically because I have python modules that 
bundle their jars with them, and when using spark-submit it is rather tedious 
to have to manually muck around with the classloader under python.

We can probably also add it to SparkR since I assume they have similar 
requirements to the PySpark side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11881][SQL] Fix for postgresql fetchsiz...

2015-11-23 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/9861#discussion_r45620973
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -489,6 +492,13 @@ private[sql] class JDBCRDD(
   }
   try {
 if (null != conn) {
+  if (!conn.getAutoCommit && !conn.isClosed) {
+try {
+  conn.commit()
+} catch {
+  case e: Exception => logWarning("Exception committing 
transaction", e)
--- End diff --

Want to do anything special for throwable vs Exception or just change it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11881][SQL] Fix for postgresql fetchsiz...

2015-11-21 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/9861#issuecomment-158701845
  
Not entirely sure why this causes NPE exceptions in some of the unit 
tests...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11881][SQL] Fix for postgresql fetchsiz...

2015-11-20 Thread mariusvniekerk
GitHub user mariusvniekerk opened a pull request:

https://github.com/apache/spark/pull/9861

[SPARK-11881][SQL] Fix for postgresql fetchsize > 0

Reference: 
https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
In order for PostgreSQL to honor the fetchSize non-zero setting, its 
Connection.autoCommit needs to be set to false. Otherwise, it will just quietly 
ignore the fetchSize setting.

This adds a new side-effecting dialect specific beforeFetch method that 
will fire before a select query is ran.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mariusvniekerk/spark SPARK-11881

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9861.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9861


commit 464976670c2857a972b38aa4d32396915d7e0c0a
Author: mariusvniekerk 
Date:   2015-11-20T14:34:31Z

[SPARK-11881][SQL] Fix for postgresql fetchsize > 0

Reference: 
https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
In order for PostgreSQL to honor the fetchSize non-zero setting, its 
Connection.autoCommit needs to be set to false. Otherwise, it will just quietly 
ignore the fetchSize setting.

This adds a new side-effecting dialect specific beforeFetch method that 
will fire before a select query is ran.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] support postgre array type ...

2015-11-12 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/9662#issuecomment-156285098
  
These test failures don't seem to be related?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] support postgre array type ...

2015-11-12 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/9662#discussion_r44664911
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -121,6 +145,12 @@ object JdbcUtils extends Logging {
 case TimestampType => stmt.setTimestamp(i + 1, 
row.getAs[java.sql.Timestamp](i))
 case DateType => stmt.setDate(i + 1, 
row.getAs[java.sql.Date](i))
 case t: DecimalType => stmt.setBigDecimal(i + 1, 
row.getDecimal(i))
+case ArrayType(et, _) =>
+  
assert(jdbcTypes(i).databaseTypeDefinition.endsWith("[]"))
--- End diff --

Is that the same in all backends that support arrays (Oracle etc)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] support postgre array type ...

2015-11-12 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/9662#issuecomment-156121867
  
I've added write support in #9137 as well if you want to just use it from 
there.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] Array types using JDBCRDD a...

2015-11-06 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/9137#issuecomment-154415116
  
@JoshRosen Guess its refactor time due to SPARK-11541.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] Array types using JDBCRDD a...

2015-10-30 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/9137#issuecomment-152561158
  
Is the best approach to rebase or just merge master into this and resolve 
conflicts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] Array types using JDBCRDD a...

2015-10-29 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/9137#issuecomment-152309023
  
I also need to rebase this thing against master again it seems


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] Array types using JDBCRDD a...

2015-10-29 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/9137#discussion_r43439253
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -121,6 +122,21 @@ object JdbcUtils extends Logging {
 case TimestampType => stmt.setTimestamp(i + 1, 
row.getAs[java.sql.Timestamp](i))
 case DateType => stmt.setDate(i + 1, 
row.getAs[java.sql.Date](i))
 case t: DecimalType => stmt.setBigDecimal(i + 1, 
row.getDecimal(i))
--- End diff --

If the particular dialect does not support these types saveTable should 
toss an exception when building the nullTypes array


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] Array types using JDBCRDD a...

2015-10-29 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/9137#discussion_r43437880
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -171,21 +187,9 @@ object JdbcUtils extends Logging {
   val name = field.name
   val typ: String =
 
dialect.getJDBCType(field.dataType).map(_.databaseTypeDefinition).getOrElse(
-  field.dataType match {
--- End diff --

Moved this one so that i could get access to it in 
PostgresDialect.getJDBCType in order to build representations for array fields


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] Array types using JDBCRDD a...

2015-10-29 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/9137#discussion_r43437555
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -207,6 +225,25 @@ case object PostgresDialect extends JdbcDialect {
   Some(StringType)
 } else if (sqlType == Types.OTHER && typeName.equals("jsonb")) {
   Some(StringType)
+} else if (sqlType == Types.OTHER && typeName.equals("uuid")) {
+Some(StringType)
+} else if (sqlType == Types.ARRAY) {
+  typeName match {
--- End diff --

The underscores are particularly for the array types.  Postgres prepends 
them to all array types here 
https://github.com/pgjdbc/pgjdbc/blob/REL9_4_1204/org/postgresql/jdbc2/TypeInfoCache.java#L159


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] Array types using JDBCRDD a...

2015-10-21 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/9137#issuecomment-149907705
  
I'll add tests once #8101 is merged in


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] Array types using JDBCRDD a...

2015-10-21 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/9137#issuecomment-149857945
  
Sure.  Had to refactor a little to work around type erasure warnings


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5753] [SQL] add JDBCRDD support for pos...

2015-10-20 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/4549#issuecomment-149619987
  
I've given this a shot in https://github.com/apache/spark/pull/9137


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] Array types using JDBCRDD a...

2015-10-19 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/9137#issuecomment-149393173
  
Still need to add some additional types from


https://github.com/pgjdbc/pgjdbc/blob/master/org/postgresql/jdbc2/TypeInfoCache.java#L70


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10186][SQL] Array types using JDBCRDD a...

2015-10-15 Thread mariusvniekerk
GitHub user mariusvniekerk opened a pull request:

https://github.com/apache/spark/pull/9137

[SPARK-10186][SQL] Array types using JDBCRDD and postgres

This change allows reading from jdbc array column types for the postgresql 
dialect.

This also opens up some implementation for array types using other jdbc 
backends.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mariusvniekerk/spark SPARK-10186

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9137.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9137


commit cf6a22b9e043671f0b0a70867a9c121f25ca6eca
Author: mariusvniekerk 
Date:   2015-10-15T17:37:32Z

[SPARK-10186] [SQL] Add support for array types using JDBCRDD and postgres

This change allows reading from jdbc array column types for the postgresql 
dialect.

This also opens up some implementation for array types using other jdbc 
backends.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org