date:20170320

[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...

2017-03-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17311
  
The difference is due to 
https://github.com/apache/spark/commit/3881f342b49efdb1e0d5ee27f616451ea1928c5d#diff-6fd847124f8eae45ba2de1cf7d6296feR855
 .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-20 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17295#discussion_r106962007
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
@@ -34,6 +34,8 @@ import org.apache.spark.util.{ShutdownHookManager, Utils}
  */
 private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: 
Boolean) extends Logging {
 
+  private val METADATA_FILE_SUFFIX = ".meta"
--- End diff --

Hmm, good point... there's currently no metadata kept in the `DiskStore` 
class, but then this shouldn't be a lot of data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-20 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17295#discussion_r106962403
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -17,48 +17,61 @@
 
 package org.apache.spark.storage
 
-import java.io.{FileOutputStream, IOException, RandomAccessFile}
+import java.io._
 import java.nio.ByteBuffer
+import java.nio.channels.{Channels, ReadableByteChannel, 
WritableByteChannel}
 import java.nio.channels.FileChannel.MapMode
+import java.nio.charset.StandardCharsets.UTF_8
 
-import com.google.common.io.Closeables
+import scala.collection.mutable.ListBuffer
 
-import org.apache.spark.SparkConf
+import com.google.common.io.{ByteStreams, Closeables, Files}
+import io.netty.channel.FileRegion
+import io.netty.util.AbstractReferenceCounted
+
+import org.apache.spark.{SecurityManager, SparkConf}
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.Utils
+import org.apache.spark.network.buffer.ManagedBuffer
+import org.apache.spark.security.CryptoStreamUtils
+import org.apache.spark.util.{ByteBufferInputStream, Utils}
 import org.apache.spark.util.io.ChunkedByteBuffer
 
 /**
  * Stores BlockManager blocks on disk.
  */
-private[spark] class DiskStore(conf: SparkConf, diskManager: 
DiskBlockManager) extends Logging {
+private[spark] class DiskStore(
+conf: SparkConf,
+diskManager: DiskBlockManager,
+securityManager: SecurityManager) extends Logging {
 
   private val minMemoryMapBytes = 
conf.getSizeAsBytes("spark.storage.memoryMapThreshold", "2m")
 
   def getSize(blockId: BlockId): Long = {
-diskManager.getFile(blockId.name).length
+val file = diskManager.getMetadataFile(blockId)
+Files.toString(file, UTF_8).toLong
   }
 
   /**
* Invokes the provided callback function to write the specific block.
*
* @throws IllegalStateException if the block already exists in the disk 
store.
*/
-  def put(blockId: BlockId)(writeFunc: FileOutputStream => Unit): Unit = {
+  def put(blockId: BlockId)(writeFunc: WritableByteChannel => Unit): Unit 
= {
 if (contains(blockId)) {
   throw new IllegalStateException(s"Block $blockId is already present 
in the disk store")
 }
 logDebug(s"Attempting to put block $blockId")
 val startTime = System.currentTimeMillis
 val file = diskManager.getFile(blockId)
-val fileOutputStream = new FileOutputStream(file)
+val out = new CountingWritableChannel(openForWrite(file))
 var threwException: Boolean = true
 try {
-  writeFunc(fileOutputStream)
+  writeFunc(out)
+  Files.write(out.getCount().toString(), 
diskManager.getMetadataFile(blockId), UTF_8)
   threwException = false
 } finally {
   try {
-Closeables.close(fileOutputStream, threwException)
+Closeables.close(out, threwException)
--- End diff --

This was the previous behavior, but well, doesn't hurt to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2017-03-20 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/14617
  
while I kind of like the hover because it doesn't clutter the page, it does 
bring up a couple concerns:
- user can't sort by them
- user might not know to hover (none of the other pages that I know of show 
useful values like this)

The latter one is perhaps the one that is more concerning.  Unless we can 
highlight it or somehow draw the users attention to it they will not know its 
there.   But the first one is also pretty valid. If I have thousands of 
executors and looking for one that is out of off heap memory, its pretty 
difficult to hover over all of them.

I don't see any changes to the Storage UI page, do we want to add something 
there as well?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17274: [SPARK-19925][SPARKR] Fix SparkR spark.getSparkFi...

2017-03-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17274#discussion_r106963651
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -177,6 +177,12 @@ test_that("add and get file to be downloaded with 
Spark job on every node", {
   spark.addFile(path)
   download_path <- spark.getSparkFiles(filename)
   expect_equal(readLines(download_path), words)
+
+  # Test spark.getSparkFiles works well on executors.
+  seq <- seq(from = 1, to = 10, length.out = 5)
+  f <- function(seq) { spark.getSparkFiles(filename) }
+  spark.lapply(seq, f)
--- End diff --

I think we should at least check the return value from spark.lapply


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-20 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17295#discussion_r106964049
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, 
diskManager: DiskBlockManager) e
   }
 
   def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = {
-put(blockId) { fileOutputStream =>
-  val channel = fileOutputStream.getChannel
-  Utils.tryWithSafeFinally {
-bytes.writeFully(channel)
-  } {
-channel.close()
-  }
+put(blockId) { channel =>
+  bytes.writeFully(channel)
 }
   }
 
-  def getBytes(blockId: BlockId): ChunkedByteBuffer = {
+  def getBytes(blockId: BlockId): BlockData = {
 val file = diskManager.getFile(blockId.name)
-val channel = new RandomAccessFile(file, "r").getChannel
-Utils.tryWithSafeFinally {
-  // For small files, directly read rather than memory map
-  if (file.length < minMemoryMapBytes) {
-val buf = ByteBuffer.allocate(file.length.toInt)
-channel.position(0)
-while (buf.remaining() != 0) {
-  if (channel.read(buf) == -1) {
-throw new IOException("Reached EOF before filling buffer\n" +
-  
s"offset=0\nfile=${file.getAbsolutePath}\nbuf.remaining=${buf.remaining}")
+val blockSize = getSize(blockId)
+
+securityManager.getIOEncryptionKey() match {
+  case Some(key) =>
+// Encrypted blocks cannot be memory mapped; return a special 
object that does decryption
+// and provides InputStream / FileRegion implementations for 
reading the data.
+new EncryptedBlockData(file, blockSize, conf, key)
+
+  case _ =>
+val channel = new FileInputStream(file).getChannel()
+if (blockSize < minMemoryMapBytes) {
+  // For small files, directly read rather than memory map.
+  Utils.tryWithSafeFinally {
+val buf = ByteBuffer.allocate(blockSize.toInt)
+while (buf.remaining() > 0) {
+  channel.read(buf)
+}
+buf.flip()
+new ByteBufferBlockData(new ChunkedByteBuffer(buf))
+  } {
+channel.close()
+  }
+} else {
+  Utils.tryWithSafeFinally {
+new ByteBufferBlockData(
+  new ChunkedByteBuffer(channel.map(MapMode.READ_ONLY, 0, 
file.length)))
+  } {
+channel.close()
   }
 }
-buf.flip()
-new ChunkedByteBuffer(buf)
-  } else {
-new ChunkedByteBuffer(channel.map(MapMode.READ_ONLY, 0, 
file.length))
-  }
-} {
-  channel.close()
 }
   }
 
   def remove(blockId: BlockId): Boolean = {
 val file = diskManager.getFile(blockId.name)
-if (file.exists()) {
-  val ret = file.delete()
-  if (!ret) {
-logWarning(s"Error deleting ${file.getPath()}")
+val meta = diskManager.getMetadataFile(blockId)
+
+def delete(f: File): Boolean = {
+  if (f.exists()) {
+val ret = f.delete()
+if (!ret) {
+  logWarning(s"Error deleting ${file.getPath()}")
+}
+
+ret
+  } else {
+false
   }
-  ret
-} else {
-  false
 }
+
+delete(file) & delete(meta)
   }
 
   def contains(blockId: BlockId): Boolean = {
 val file = diskManager.getFile(blockId.name)
 file.exists()
   }
+
+  private def openForWrite(file: File): WritableByteChannel = {
+val out = new FileOutputStream(file).getChannel()
+try {
+  securityManager.getIOEncryptionKey().map { key =>
+CryptoStreamUtils.createWritableChannel(out, conf, key)
+  }.getOrElse(out)
+} catch {
+  case e: Exception =>
+out.close()
--- End diff --

There might be exceptions specific for the commons-crypto library being 
thrown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-20 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17295#discussion_r106965268
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, 
diskManager: DiskBlockManager) e
   }
 
   def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = {
-put(blockId) { fileOutputStream =>
-  val channel = fileOutputStream.getChannel
-  Utils.tryWithSafeFinally {
-bytes.writeFully(channel)
-  } {
-channel.close()
-  }
+put(blockId) { channel =>
+  bytes.writeFully(channel)
 }
   }
 
-  def getBytes(blockId: BlockId): ChunkedByteBuffer = {
+  def getBytes(blockId: BlockId): BlockData = {
 val file = diskManager.getFile(blockId.name)
-val channel = new RandomAccessFile(file, "r").getChannel
-Utils.tryWithSafeFinally {
-  // For small files, directly read rather than memory map
-  if (file.length < minMemoryMapBytes) {
-val buf = ByteBuffer.allocate(file.length.toInt)
-channel.position(0)
-while (buf.remaining() != 0) {
-  if (channel.read(buf) == -1) {
-throw new IOException("Reached EOF before filling buffer\n" +
-  
s"offset=0\nfile=${file.getAbsolutePath}\nbuf.remaining=${buf.remaining}")
+val blockSize = getSize(blockId)
+
+securityManager.getIOEncryptionKey() match {
+  case Some(key) =>
+// Encrypted blocks cannot be memory mapped; return a special 
object that does decryption
+// and provides InputStream / FileRegion implementations for 
reading the data.
+new EncryptedBlockData(file, blockSize, conf, key)
+
+  case _ =>
+val channel = new FileInputStream(file).getChannel()
+if (blockSize < minMemoryMapBytes) {
+  // For small files, directly read rather than memory map.
+  Utils.tryWithSafeFinally {
+val buf = ByteBuffer.allocate(blockSize.toInt)
+while (buf.remaining() > 0) {
+  channel.read(buf)
+}
+buf.flip()
+new ByteBufferBlockData(new ChunkedByteBuffer(buf))
+  } {
+channel.close()
+  }
+} else {
+  Utils.tryWithSafeFinally {
+new ByteBufferBlockData(
+  new ChunkedByteBuffer(channel.map(MapMode.READ_ONLY, 0, 
file.length)))
+  } {
+channel.close()
   }
 }
-buf.flip()
-new ChunkedByteBuffer(buf)
-  } else {
-new ChunkedByteBuffer(channel.map(MapMode.READ_ONLY, 0, 
file.length))
-  }
-} {
-  channel.close()
 }
   }
 
   def remove(blockId: BlockId): Boolean = {
 val file = diskManager.getFile(blockId.name)
-if (file.exists()) {
-  val ret = file.delete()
-  if (!ret) {
-logWarning(s"Error deleting ${file.getPath()}")
+val meta = diskManager.getMetadataFile(blockId)
+
+def delete(f: File): Boolean = {
+  if (f.exists()) {
+val ret = f.delete()
+if (!ret) {
+  logWarning(s"Error deleting ${file.getPath()}")
+}
+
+ret
+  } else {
+false
   }
-  ret
-} else {
-  false
 }
+
+delete(file) & delete(meta)
   }
 
   def contains(blockId: BlockId): Boolean = {
 val file = diskManager.getFile(blockId.name)
 file.exists()
   }
+
+  private def openForWrite(file: File): WritableByteChannel = {
+val out = new FileOutputStream(file).getChannel()
+try {
+  securityManager.getIOEncryptionKey().map { key =>
+CryptoStreamUtils.createWritableChannel(out, conf, key)
+  }.getOrElse(out)
+} catch {
+  case e: Exception =>
+out.close()
+throw e
+}
+  }
+
+}
+
+private class EncryptedBlockData(
+file: File,
+blockSize: Long,
+conf: SparkConf,
+key: Array[Byte]) extends BlockData {
+
+  override def toInputStream(): InputStream = 
Channels.newInputStream(open())
+
+  override def toManagedBuffer(): ManagedBuffer = new 
EncryptedManagedBuffer()
+
+  override def toByteBuffer(allocator: Int => ByteBuffer): 
ChunkedByteBuffer = {
+val source = open()
+try {
+  var remaining = blockSize
+  val chunks = new ListBuffer[ByteBuffer]()
+  while (remaining > 0) {
+val chunkSize = math.min(remaining, Int.MaxValue)
+val chunk = allocator(chunkSi

[GitHub] spark pull request #17274: [SPARK-19925][SPARKR] Fix SparkR spark.getSparkFi...

2017-03-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17274#discussion_r106963159
  
--- Diff: R/pkg/R/context.R ---
@@ -345,7 +351,7 @@ spark.getSparkFilesRootDirectory <- function() {
 #'}
 #' @note spark.getSparkFiles since 2.1.0
 spark.getSparkFiles <- function(fileName) {
-  callJStatic("org.apache.spark.SparkFiles", "get", as.character(fileName))
+  file.path(spark.getSparkFilesRootDirectory(), as.character(fileName))
--- End diff --

perhaps to check, does getSparkFilesRootDirectory returns something that 
file.path can handle, for instance, `file://` wouldn't work properly 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17274: [SPARK-19925][SPARKR] Fix SparkR spark.getSparkFi...

2017-03-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17274#discussion_r106963388
  
--- Diff: R/pkg/R/context.R ---
@@ -345,7 +351,7 @@ spark.getSparkFilesRootDirectory <- function() {
 #'}
 #' @note spark.getSparkFiles since 2.1.0
 spark.getSparkFiles <- function(fileName) {
-  callJStatic("org.apache.spark.SparkFiles", "get", as.character(fileName))
+  file.path(spark.getSparkFilesRootDirectory(), as.character(fileName))
--- End diff --

or that fileName can be something file.path doesn't handle but 
SparkFiles.get() can? like an absolute path?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17362: [SPARK-20033][SQL] support hive permanent function

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17362
  
**[Test build #74890 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74890/testReport)**
 for PR 17362 at commit 
[`9325a7c`](https://github.com/apache/spark/commit/9325a7c2a4ce9d0db6c39d7781e517082a2fe4be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-20 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17295#discussion_r106963546
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, 
diskManager: DiskBlockManager) e
   }
 
   def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = {
-put(blockId) { fileOutputStream =>
-  val channel = fileOutputStream.getChannel
-  Utils.tryWithSafeFinally {
-bytes.writeFully(channel)
-  } {
-channel.close()
-  }
+put(blockId) { channel =>
+  bytes.writeFully(channel)
--- End diff --

The channel is owned by the code in the `put` method, which does that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should...

2017-03-20 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/17363

[SPARK-19970][SQL][BRANCH-2.1] Table owner should be USER instead of 
PRINCIPAL in kerberized clusters

## What changes were proposed in this pull request?

In the kerberized hadoop cluster, when Spark creates tables, the owner of 
tables are filled with PRINCIPAL strings instead of USER names. This is 
inconsistent with Hive and causes problems when using 
[ROLE](https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization)
 in Hive. We had better to fix this.

**BEFORE**
```scala
scala> sql("create table t(a int)").show
scala> sql("desc formatted t").show(false)
...
|Owner:  |sp...@example.com 
|   |
```

**AFTER**
```scala
scala> sql("create table t(a int)").show
scala> sql("desc formatted t").show(false)
...
|Owner:  |spark 
|   |
```

## How was this patch tested?

Manually do `create table` and `desc formatted` because this happens in 
Kerberized clusters.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-19970-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17363






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USE...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17363
  
**[Test build #74891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74891/testReport)**
 for PR 17363 at commit 
[`1328f1d`](https://github.com/apache/spark/commit/1328f1d204991f4b4a993a523bd01584bab3122c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17326: [SPARK-19985][ML] Fixed copy method for some ML M...

2017-03-20 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/17326#discussion_r106969492
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala
 ---
@@ -74,6 +74,7 @@ class MultilayerPerceptronClassifierSuite
   .setMaxIter(100)
   .setSolver("l-bfgs")
 val model = trainer.fit(dataset)
+MLTestingUtils.checkCopy(model)
--- End diff --

Thanks @hhbyyh , I agree that this is maybe not the best general way to 
test it, but this is how it is done in every other `Model` test.  Maybe this is 
ok for now and we could follow up with a discussion on the best way to check 
"basic" ML functionality?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17357: [SPARK-20025][CORE] Fix spark's driver failover mechanis...

2017-03-20 Thread markhamstra

Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/17357
  
Please change the title of this PR. "Fixed foo" is nearly useless when 
scanning the commit log in the future since it doesn't tell us anything about 
either the nature of the problem or the actual code change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17321: [SPARK-19899][ML] Replace featuresCol with itemsC...

2017-03-20 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17321#discussion_r106970923
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -37,7 +37,20 @@ import org.apache.spark.sql.types._
 /**
  * Common params for FPGrowth and FPGrowthModel
  */
-private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with 
HasPredictionCol {
+private[fpm] trait FPGrowthParams extends Params with HasPredictionCol {
+
+  /**
+   * Items column name.
+   * Default: "items"
+   * @group param
+   */
+  @Since("2.2.0")
--- End diff --

I think it's OK to have it here.  Only FPGrowth and FPGrowthModel will ever 
inherit from FPGrowthParams.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...

2017-03-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17287
  
I addressed these comments in the PR 
https://github.com/apache/spark/pull/17354


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17354
  
**[Test build #74892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74892/testReport)**
 for PR 17354 at commit 
[`31f1099`](https://github.com/apache/spark/commit/31f1099e4c926cbfdf0eb74e999be051e7257d17).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17321: [SPARK-19899][ML] Replace featuresCol with itemsCol in m...

2017-03-20 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/17321
  
LGTM
Thanks for the PR!
Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17339: [SPARK-20010][SQL] Sort information is lost after sort m...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17339
  
**[Test build #74888 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74888/testReport)**
 for PR 17339 at commit 
[`9d2ab54`](https://github.com/apache/spark/commit/9d2ab549f19427ffc71ef9cef0517c080b71e31e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SortOrder(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17339: [SPARK-20010][SQL] Sort information is lost after sort m...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17339
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17321: [SPARK-19899][ML] Replace featuresCol with itemsC...

2017-03-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17321


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17339: [SPARK-20010][SQL] Sort information is lost after sort m...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17339
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74888/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17293: [SPARK-19950][SQL] Fix to ignore nullable when df.load()...

2017-03-20 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/17293
  
@HyukjinKwon I added data validation using schema information for Parquet 
Reader. Could you please take a look?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should...

2017-03-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/17363#discussion_r106973465
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -828,7 +828,7 @@ private[hive] class HiveClientImpl(
   hiveTable.setFields(schema.asJava)
 }
 hiveTable.setPartCols(partCols.asJava)
-hiveTable.setOwner(conf.getUser)
+hiveTable.setOwner(state.getAuthenticator().getUserName())
--- End diff --

@vanzin . I made a backport for branch-2.1 and tested in kerberized cluster.
This one uses `state` as you advised.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Force user specified schema to the nu...

2017-03-20 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/14124
  
#17293 added data validation using schema information for Parquet Reader.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17343
  
**[Test build #74893 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74893/testReport)**
 for PR 17343 at commit 
[`5fe279e`](https://github.com/apache/spark/commit/5fe279ea3b5a70206be71ff9a5ebf2175cf0f8d1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2017-03-20 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/14617
  
good points @tgravescs .  What about making them additional metrics, turned 
on by a checkbox, like the extra task metrics?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-03-20 Thread weiqingy

Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17342#discussion_r106977152
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
@@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: 
SparkContext) extends Logging {
 
 object SharedState {
 
+  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
--- End diff --

`URL#setURLStreamHandlerFactory` can only be called once per JVM. If we use 
`FsUrlStreamHandlerFactory` directly, we won't be able to support other 
factories. I wrapped it for future extendability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-03-20 Thread weiqingy

Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17342#discussion_r106977805
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int 
= 10240) extends java.io.Ou
 new String(nonCircularBuffer, StandardCharsets.UTF_8)
   }
 }
+
+
+/**
+ * Factory for URL stream handlers. It relies on 'protocol' to choose the 
appropriate
+ * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' 
branches in
+ * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
+ */
+private[spark] class SparkUrlStreamHandlerFactory extends 
URLStreamHandlerFactory {
+  private var hdfsHandler : URLStreamHandler = _
+
+  def createURLStreamHandler(protocol: String): URLStreamHandler = {
+if (protocol.compareToIgnoreCase("hdfs") == 0) {
--- End diff --

Yeah, that's a good point. I'll check with Hadoop for all supported file 
systems, and ideally if we can get them via some API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-03-20 Thread weiqingy

Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17342#discussion_r106978348
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int 
= 10240) extends java.io.Ou
 new String(nonCircularBuffer, StandardCharsets.UTF_8)
   }
 }
+
+
+/**
+ * Factory for URL stream handlers. It relies on 'protocol' to choose the 
appropriate
+ * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' 
branches in
+ * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
+ */
+private[spark] class SparkUrlStreamHandlerFactory extends 
URLStreamHandlerFactory {
+  private var hdfsHandler : URLStreamHandler = _
+
+  def createURLStreamHandler(protocol: String): URLStreamHandler = {
+if (protocol.compareToIgnoreCase("hdfs") == 0) {
+  if (hdfsHandler == null) {
+hdfsHandler = new 
FsUrlStreamHandlerFactory().createURLStreamHandler(protocol)
--- End diff --

Thanks, I'll follow your suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17362: [SPARK-20033][SQL] support hive permanent function

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17362
  
**[Test build #74890 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74890/testReport)**
 for PR 17362 at commit 
[`9325a7c`](https://github.com/apache/spark/commit/9325a7c2a4ce9d0db6c39d7781e517082a2fe4be).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17362: [SPARK-20033][SQL] support hive permanent function

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17362
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17362: [SPARK-20033][SQL] support hive permanent function

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17362
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74890/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread zero323

Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/17170
  
Jenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-20 Thread zero323

Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/17218
  
Jenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17170
  
**[Test build #74896 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74896/testReport)**
 for PR 17170 at commit 
[`706514d`](https://github.com/apache/spark/commit/706514da26107ef25bef028e2143fa0a09e5cc19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17361: [SPARK-20030][SS][WIP]Event-time-based timeout for MapGr...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17361
  
**[Test build #74894 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74894/testReport)**
 for PR 17361 at commit 
[`6e9f408`](https://github.com/apache/spark/commit/6e9f408d73090429d7497840d6daa7a4e19439e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17218
  
**[Test build #74895 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74895/testReport)**
 for PR 17218 at commit 
[`0a3798d`](https://github.com/apache/spark/commit/0a3798d906e1341303b6872d44d5ce68d853aae4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17218
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17218
  
**[Test build #74895 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74895/testReport)**
 for PR 17218 at commit 
[`0a3798d`](https://github.com/apache/spark/commit/0a3798d906e1341303b6872d44d5ce68d853aae4).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HasItemsCol(Params):`
  * `class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):`
  * `class FPGrowth(JavaEstimator, HasItemsCol, HasPredictionCol,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17218
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74895/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-20 Thread sitalkedia

Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/17343
  
@rxin - Updated documentation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17354
  
**[Test build #74892 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74892/testReport)**
 for PR 17354 at commit 
[`31f1099`](https://github.com/apache/spark/commit/31f1099e4c926cbfdf0eb74e999be051e7257d17).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17354
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17354
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74892/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17343
  
**[Test build #74897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74897/testReport)**
 for PR 17343 at commit 
[`06c1909`](https://github.com/apache/spark/commit/06c1909fbf35170639d12f32480b369c702dbe24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17218
  
**[Test build #74898 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74898/testReport)**
 for PR 17218 at commit 
[`5f4673e`](https://github.com/apache/spark/commit/5f4673e74049d9f6918f4e029215ae6c8364043e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17218
  
**[Test build #74898 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74898/testReport)**
 for PR 17218 at commit 
[`5f4673e`](https://github.com/apache/spark/commit/5f4673e74049d9f6918f4e029215ae6c8364043e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17218
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17218
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74898/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...

2017-03-20 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17302#discussion_r106992055
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -70,7 +70,20 @@ object RDDConversions {
 object ExternalRDD {
 
   def apply[T: Encoder](rdd: RDD[T], session: SparkSession): LogicalPlan = 
{
-val externalRdd = ExternalRDD(CatalystSerde.generateObjAttr[T], 
rdd)(session)
+val attr = {
+  val attr = CatalystSerde.generateObjAttr[T]
--- End diff --

I see. `ScalaReflection.schemaFor[T]` requires `TypeTag`. 
IIUC, to pass `TypeTag` to `CatalystSerde.generateObjAttr` seems to require 
several code changes across files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USE...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17363
  
**[Test build #74891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74891/testReport)**
 for PR 17363 at commit 
[`1328f1d`](https://github.com/apache/spark/commit/1328f1d204991f4b4a993a523bd01584bab3122c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USE...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17363
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74891/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USE...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17363
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12004
  
**[Test build #74899 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74899/testReport)**
 for PR 12004 at commit 
[`83d9368`](https://github.com/apache/spark/commit/83d936870ad0651fc2622593e53d3e31d7eb8d4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...

2017-03-20 Thread maver1ck

Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/17328
  
Looks good :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17170
  
**[Test build #74896 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74896/testReport)**
 for PR 17170 at commit 
[`706514d`](https://github.com/apache/spark/commit/706514da26107ef25bef028e2143fa0a09e5cc19).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17170
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17170
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74896/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17250: [SPARK-19911][STREAMING] Add builder interface for Kines...

2017-03-20 Thread budde

Github user budde commented on the issue:

https://github.com/apache/spark/pull/17250
  
@brkyvz PR has been updated, apologies for the delay. I've added 
```SerializableCredentialsProvider.Builder```, which I'm willing to hear 
suggestions for a better name on. I wanted to stay away from something like 
```AWSCredentials.Builder``` so as to avoid confusion with similarly-named 
classes in the AWS Java SDK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17250: [SPARK-19911][STREAMING] Add builder interface for Kines...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17250
  
**[Test build #74901 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74901/testReport)**
 for PR 17250 at commit 
[`d6afaef`](https://github.com/apache/spark/commit/d6afaef4099d9b20d0d1257ec5942d1bb5b868af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17354
  
**[Test build #74900 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74900/testReport)**
 for PR 17354 at commit 
[`e860fd7`](https://github.com/apache/spark/commit/e860fd77a62dbec4ce11c89864a925223aba6727).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USE...

2017-03-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17363
  
This is also tested manually in kerberized cluster, @vanzin .

BTW, Spark 1.6 has the same issue at 
[ClientWrapper](https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala#L379).

According to the email, there exists demands on more Apache Spark 1.6.X. 
May I create a backport for branch-1.6? How do you think about that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106997366
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -27,41 +27,67 @@ import 
org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.catalyst.plans.logical.{Range, SubqueryAlias, 
View}
 
+class InMemorySessionCatalogSuite extends SessionCatalogSuite {
+  protected val utils = new CatalogTestUtils {
+override val tableInputFormat: String = 
"com.fruit.eyephone.CameraInputFormat"
+override val tableOutputFormat: String = 
"com.fruit.eyephone.CameraOutputFormat"
+override val defaultProvider: String = "parquet"
+override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog
+  }
+}
+
 /**
- * Tests for [[SessionCatalog]] that assume that [[InMemoryCatalog]] is 
correctly implemented.
+ * Tests for [[SessionCatalog]]
  *
  * Note: many of the methods here are very similar to the ones in 
[[ExternalCatalogSuite]].
  * This is because [[SessionCatalog]] and [[ExternalCatalog]] share many 
similar method
  * signatures but do not extend a common parent. This is largely by design 
but
  * unfortunately leads to very similar test code in two places.
  */
-class SessionCatalogSuite extends PlanTest {
-  private val utils = new CatalogTestUtils {
-override val tableInputFormat: String = 
"com.fruit.eyephone.CameraInputFormat"
-override val tableOutputFormat: String = 
"com.fruit.eyephone.CameraOutputFormat"
-override val defaultProvider: String = "parquet"
-override def newEmptyCatalog(): ExternalCatalog = new InMemoryCatalog
-  }
+abstract class SessionCatalogSuite extends PlanTest {
+  protected val utils: CatalogTestUtils
+
+  protected val isHiveExternalCatalog = false
 
   import utils._
 
+  private def withBasicCatalog(f: SessionCatalog => Unit): Unit = {
+val catalog = new SessionCatalog(newBasicCatalog())
+catalog.createDatabase(newDb("default"), ignoreIfExists = true)
+try {
+  f(catalog)
+} finally {
+  catalog.reset()
+}
+  }
+
+  private def withEmptyCatalog(f: SessionCatalog => Unit): Unit = {
+val catalog = new SessionCatalog(newEmptyCatalog())
+catalog.createDatabase(newDb("default"), ignoreIfExists = true)
--- End diff --

Our `reset()` function assumes the existence of Default


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17354
  
**[Test build #74902 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74902/testReport)**
 for PR 17354 at commit 
[`f23984d`](https://github.com/apache/spark/commit/f23984d2ab68bf15cd593106d052206f38dbc362).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17170
  
this is the error from appveyor build
```
[00:16:14] [ERROR] 
C:\projects\spark\mllib\src\main\scala\org\apache\spark\ml\r\FPGrowthWrapper.scala:50:
 value setItemsCol is not a member of org.apache.spark.ml.fpm.FPGrowth
[00:16:14] possible cause: maybe a semicolon is missing before `value 
setItemsCol'?
[00:16:14] [ERROR]   .setItemsCol(itemsCol)
[00:16:14] [ERROR]^
[00:16:29] [ERROR] one error found
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-03-20 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/17342#discussion_r107001274
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int 
= 10240) extends java.io.Ou
 new String(nonCircularBuffer, StandardCharsets.UTF_8)
   }
 }
+
+
+/**
+ * Factory for URL stream handlers. It relies on 'protocol' to choose the 
appropriate
+ * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' 
branches in
+ * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
+ */
+private[spark] class SparkUrlStreamHandlerFactory extends 
URLStreamHandlerFactory {
+  private var hdfsHandler : URLStreamHandler = _
+
+  def createURLStreamHandler(protocol: String): URLStreamHandler = {
+if (protocol.compareToIgnoreCase("hdfs") == 0) {
--- End diff --

API? no, just fs.*.impl for the standard ones, discovery via 
META-INF/services and you don't want to go there. Probably better to have a 
core list of the hadoop redists (including the new 2.8+ adl & oss object 
stores), and the google cloud URL (gss ? )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17364: [SPARK-20038] [core]: move the currentWriter=null...

2017-03-20 Thread steveloughran

GitHub user steveloughran opened a pull request:

https://github.com/apache/spark/pull/17364

[SPARK-20038] [core]: move the currentWriter=null assignments into finally 
{} â¦

## What changes were proposed in this pull request?

have the`FileFormatWriter.ExecuteWriteTask.releaseResources()` 
implementations  set `currentWriter=null` in a finally clause. This guarantees 
that if the first call to `currentWriter()` throws an exception, the second 
releaseResources() call made during the task cancel process will not trigger a 
second attempt to close the stream. 

## How was this patch tested?

Tricky. I've been fixing the underlying cause when I saw the problem 
(HADOOP-14204), but SPARK-10109 shows I'm not the first to have seen this. I 
can't replicate it locally any more, my code no longer being broken.

code review, however, should be straightforward

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/spark stevel/SPARK-20038-close

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17364.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17364


commit 725740b49a2b37392699092b1b0e08c63a6152ff
Author: Steve Loughran 
Date:   2017-03-20T19:54:51Z

SPARK-20038: move the currentWriter=null assignments into finally {} clauses

Change-Id: I1e07f5b90ba1a2b05978b1d65876d746d07d1f3c




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17250: [SPARK-19911][STREAMING] Add builder interface for Kines...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17250
  
**[Test build #74901 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74901/testReport)**
 for PR 17250 at commit 
[`d6afaef`](https://github.com/apache/spark/commit/d6afaef4099d9b20d0d1257ec5942d1bb5b868af).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class Builder `
  * `  class Builder `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17250: [SPARK-19911][STREAMING] Add builder interface for Kines...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17250
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17250: [SPARK-19911][STREAMING] Add builder interface for Kines...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17250
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74901/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17364: [SPARK-20038] [core]: FileFormatWriter.ExecuteWriteTask....

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17364
  
**[Test build #74903 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74903/testReport)**
 for PR 17364 at commit 
[`725740b`](https://github.com/apache/spark/commit/725740b49a2b37392699092b1b0e08c63a6152ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17250: [SPARK-19911][STREAMING] Add builder interface fo...

2017-03-20 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/17250#discussion_r107003143
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala
 ---
@@ -71,7 +75,256 @@ private[kinesis] class KinesisInputDStream[T: ClassTag](
 
   override def getReceiver(): Receiver[T] = {
 new KinesisReceiver(streamName, endpointUrl, regionName, 
initialPositionInStream,
-  checkpointAppName, checkpointInterval, storageLevel, messageHandler,
-  kinesisCredsProvider)
+  checkpointAppName, checkpointInterval, _storageLevel, messageHandler,
+  kinesisCredsProvider, dynamoDBCredsProvider, cloudWatchCredsProvider)
   }
 }
+
+@InterfaceStability.Stable
+object KinesisInputDStream {
+  /**
+   * Builder for [[KinesisInputDStream]] instances.
+   *
+   * @since 2.2.0
+   */
+  @InterfaceStability.Stable
+  class Builder {
+// Required params
+private var streamingContext: Option[StreamingContext] = None
+private var streamName: Option[String] = None
+private var checkpointAppName: Option[String] = None
+
+// Params with defaults
+private var endpointUrl: Option[String] = None
+private var regionName: Option[String] = None
+private var initialPositionInStream: Option[InitialPositionInStream] = 
None
+private var checkpointInterval: Option[Duration] = None
+private var storageLevel: Option[StorageLevel] = None
+private var kinesisCredsProvider: 
Option[SerializableCredentialsProvider] = None
+private var dynamoDBCredsProvider: 
Option[SerializableCredentialsProvider] = None
+private var cloudWatchCredsProvider: 
Option[SerializableCredentialsProvider] = None
+
+/**
+ * Sets the StreamingContext that will be used to construct the 
Kinesis DStream. This is a
+ * required parameter.
+ *
+ * @param ssc [[StreamingContext]] used to construct Kinesis DStreams
+ * @return Reference to this [[KinesisInputDStream.Builder]]
+ */
+def streamingContext(ssc: StreamingContext): Builder = {
+  streamingContext = Option(ssc)
+  this
+}
+
+/**
+ * Sets the StreamingContext that will be used to construct the 
Kinesis DStream. This is a
+ * required parameter.
+ *
+ * @param jssc [[JavaStreamingContext]] used to construct Kinesis 
DStreams
+ * @return Reference to this [[KinesisInputDStream.Builder]]
+ */
+def streamingContext(jssc: JavaStreamingContext): Builder = {
+  streamingContext = Option(jssc.ssc)
+  this
+}
+
+/**
+ * Sets the name of the Kinesis stream that the DStream will read 
from. This is a required
+ * parameter.
+ *
+ * @param streamName Name of Kinesis stream that the DStream will read 
from
+ * @return Reference to this [[KinesisInputDStream.Builder]]
+ */
+def streamName(streamName: String): Builder = {
+  this.streamName = Option(streamName)
+  this
+}
+
+/**
+ * Sets the KCL application name to use when checkpointing state to 
DynamoDB. This is a
+ * required parameter.
+ *
+ * @param appName Value to use for the KCL app name (used when 
creating the DynamoDB checkpoint
+ *table and when writing metrics to CloudWatch)
+ * @return Reference to this [[KinesisInputDStream.Builder]]
+ */
+def checkpointAppName(appName: String): Builder = {
+  checkpointAppName = Option(appName)
+  this
+}
+
+/**
+ * Sets the AWS Kinesis endpoint URL. Defaults to 
"https://kinesis.us-east-1.amazonaws.com"; if
+ * no custom value is specified
+ *
+ * @param url Kinesis endpoint URL to use
+ * @return Reference to this [[KinesisInputDStream.Builder]]
+ */
+def endpointUrl(url: String): Builder = {
+  endpointUrl = Option(url)
+  this
+}
+
+/**
+ * Sets the AWS region to construct clients for. Defaults to 
"us-east-1" if no custom value
+ * is specified.
+ *
+ * @param regionName Name of AWS region to use (e.g. "us-west-2")
+ * @return Reference to this [[KinesisInputDStream.Builder]]
+ */
+def regionName(regionName: String): Builder = {
+  this.regionName = Option(regionName)
+  this
+}
+
+/**
+ * Sets the initial position data is read from in the Kinesis stream. 
Defaults to
+ * [[InitialPositionInStream.LATEST]] if no custom value is specified.
+ *
+ * @param initialPosition InitialPositionInStream value specifying 
where Spark Streaming

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17354
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17354
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74902/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17354
  
**[Test build #74902 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74902/testReport)**
 for PR 17354 at commit 
[`f23984d`](https://github.com/apache/spark/commit/f23984d2ab68bf15cd593106d052206f38dbc362).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17354
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17354
  
**[Test build #74904 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74904/testReport)**
 for PR 17354 at commit 
[`f23984d`](https://github.com/apache/spark/commit/f23984d2ab68bf15cd593106d052206f38dbc362).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-20 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17295#discussion_r107007132
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -56,6 +57,43 @@ private[spark] class BlockResult(
 val bytes: Long)
 
 /**
+ * Abstracts away how blocks are stored and provides different ways to 
read the underlying block
+ * data. The data for a BlockData instance can only be read once, since it 
may be backed by open
+ * file descriptors that change state as data is read.
+ */
+private[spark] trait BlockData {
+
+  def toInputStream(): InputStream
+
+  def toManagedBuffer(): ManagedBuffer
+
+  def toByteBuffer(allocator: Int => ByteBuffer): ChunkedByteBuffer
+
+  def size: Long
+
+  def dispose(): Unit
+
+}
+
+private[spark] class ByteBufferBlockData(
+val buffer: ChunkedByteBuffer,
+autoDispose: Boolean = true) extends BlockData {
+
+  override def toInputStream(): InputStream = buffer.toInputStream(dispose 
= autoDispose)
+
+  override def toManagedBuffer(): ManagedBuffer = new 
NettyManagedBuffer(buffer.toNetty)
+
+  override def toByteBuffer(allocator: Int => ByteBuffer): 
ChunkedByteBuffer = {
+buffer.copy(allocator)
+  }
--- End diff --

So I had traced through that stuff 2 or 3 times, and now I did it again and 
I think I finally understood all that's going on. Basically, the old code was 
really bad at explicitly disposing of the buffers, meaning a bunch of paths 
(like the ones that used managed buffers) didn't bother to do it and just left 
the work to the GC.

I changed the code a bit to make the dispose more explicit and added 
comments in a few key places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17295
  
**[Test build #74905 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74905/testReport)**
 for PR 17295 at commit 
[`1428fcd`](https://github.com/apache/spark/commit/1428fcd952ddcdb29a561d8c1c90d4f820955c15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17361: [SPARK-20030][SS][WIP]Event-time-based timeout for MapGr...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17361
  
**[Test build #74894 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74894/testReport)**
 for PR 17361 at commit 
[`6e9f408`](https://github.com/apache/spark/commit/6e9f408d73090429d7497840d6daa7a4e19439e6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17361: [SPARK-20030][SS][WIP]Event-time-based timeout for MapGr...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17361
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74894/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17361: [SPARK-20030][SS][WIP]Event-time-based timeout for MapGr...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17361
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17354
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17354
  
**[Test build #74900 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74900/testReport)**
 for PR 17354 at commit 
[`e860fd7`](https://github.com/apache/spark/commit/e860fd77a62dbec4ce11c89864a925223aba6727).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog API setC...

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17354
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74900/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17315: [SPARK-19949][SQL] unify bad record handling in C...

2017-03-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17315#discussion_r107011371
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/FailureSafeParser.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.unsafe.types.UTF8String
+
+class FailureSafeParser[IN](
+rawParser: IN => Seq[InternalRow],
+mode: String,
+schema: StructType,
+columnNameOfCorruptRecord: String) {
+
+  private val corruptFieldIndex = 
schema.getFieldIndex(columnNameOfCorruptRecord)
+  private val actualSchema = StructType(schema.filterNot(_.name == 
columnNameOfCorruptRecord))
+  private val resultRow = new GenericInternalRow(schema.length)
+  private val nullResult = new GenericInternalRow(schema.length)
+
+  // This function takes 2 parameters: an optional partial result, and the 
bad record. If the given
+  // schema doesn't contain a field for corrupted record, we just return 
the partial result or a
+  // row with all fields null. If the given schema contains a field for 
corrupted record, we will
+  // set the bad record to this field, and set other fields according to 
the partial result or null.
+  private val toResultRow: (Option[InternalRow], () => UTF8String) => 
InternalRow = {
+if (corruptFieldIndex.isDefined) {
+  (row, badRecord) => {
+var i = 0
+while (i < actualSchema.length) {
+  val f = actualSchema(i)
+  resultRow(schema.fieldIndex(f.name)) = row.map(_.get(i, 
f.dataType)).orNull
+  i += 1
+}
+resultRow(corruptFieldIndex.get) = badRecord()
+resultRow
+  }
+} else {
+  (row, _) => row.getOrElse(nullResult)
+}
+  }
+
+  def parse(input: IN): Iterator[InternalRow] = {
+try {
+  rawParser.apply(input).toIterator.map(row => toResultRow(Some(row), 
() => null))
+} catch {
+  case e: BadRecordException if ParseModes.isPermissiveMode(mode) =>
+Iterator(toResultRow(e.partialResult(), e.record))
+  case _: BadRecordException if ParseModes.isDropMalformedMode(mode) =>
+Iterator.empty
+  case e: BadRecordException => throw e.cause
--- End diff --

This is `FAIL_FAST_MODE`, if my understanding is not wrong. Should we issue 
the error message including  `FAILFAST`, like what we did before?

This is also an behavior change? If users did not correctly spell the mode 
string, we treated it as the `PERMISSIVE` mode. Now, we changed it to the 
`FAILFAST ` mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17315: [SPARK-19949][SQL] unify bad record handling in C...

2017-03-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17315#discussion_r107012038
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -382,11 +383,17 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
 }
 
 verifyColumnNameOfCorruptRecord(schema, 
parsedOptions.columnNameOfCorruptRecord)
+val dataSchema = StructType(schema.filterNot(_.name == 
parsedOptions.columnNameOfCorruptRecord))
--- End diff --

Nit: `dataSchema ` -> `actualSchema`? Be consistent what we did in the 
other place?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r107011970
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala 
---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FPGrowthWrapper private (val fpGrowthModel: 
FPGrowthModel) extends MLWritable {
+  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
+  def associationRules: DataFrame = fpGrowthModel.associationRules
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+fpGrowthModel.transform(dataset)
+  }
+
+  override def write: MLWriter = new 
FPGrowthWrapper.FPGrowthWrapperWriter(this)
+}
+
+private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
+
+  def fit(
+   data: DataFrame,
+   minSupport: Double,
+   minConfidence: Double,
+   itemsCol: String,
+   numPartitions: Integer): FPGrowthWrapper = {
+val fpGrowth = new FPGrowth()
+  .setMinSupport(minSupport)
+  .setMinConfidence(minConfidence)
+  .setItemsCol(itemsCol)
+
+if (numPartitions != null && numPartitions > 0) {
+  fpGrowth.setNumPartitions(numPartitions)
+}
+
+val fpGrowthModel = fpGrowth.fit(data)
+
+new FPGrowthWrapper(fpGrowthModel)
+  }
+
+  override def read: MLReader[FPGrowthWrapper] = new FPGrowthWrapperReader
+
+  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] {
+override def load(path: String): FPGrowthWrapper = {
+  val modelPath = new Path(path, "model").toString
+  val fPGrowthModel = FPGrowthModel.load(modelPath)
+
+  new FPGrowthWrapper(fPGrowthModel)
+}
+  }
+
+  class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter {
+override protected def saveImpl(path: String): Unit = {
+  val modelPath = new Path(path, "model").toString
+  val rMetadataPath = new Path(path, "rMetadata").toString
--- End diff --

anything else we could add as metadata that is not in the model already?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r107010057
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,153 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
+#' PFP distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param itemsCol Items column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#' @rdname spark.fpGrowth
+#' @name spark.fpGrowth
+#' @aliases spark.fpGrowth,SparkDataFrame-method
+#' @export
+#' @examples
+#' \dontrun{
+#' raw_data <- read.df(
+#'   "data/mllib/sample_fpgrowth.txt",
+#'   source = "csv",
+#'   schema = structType(structField("raw_items", "string")))
+#'
+#' data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
+#' model <- spark.fpGrowth(data)
+#'
+#' # Show frequent itemsets
+#' frequent_itemsets <- spark.freqItemsets(model)
+#' showDF(frequent_itemsets)
+#'
+#' # Show association rules
+#' association_rules <- spark.associationRules(model)
+#' showDF(association_rules)
+#'
+#' # Predict on new data
+#' new_itemsets <- data.frame(items = c("t", "t,s"))
+#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, 
',') as items")
+#' predict(model, new_data)
+#'
+#' # Save and load model
+#' path <- "/path/to/model"
+#' write.ml(model, path)
+#' read.ml(path)
+#'
+#' # Optional arguments
+#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, 
',') as baskets")
+#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 
0.5
+#' itemsCol = "baskets", numPartitions = 
10)
+#' }
+#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
--- End diff --

we don't generally use this tag. Do you want to move to @seealso, or just 
link to in the description above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r107011745
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala 
---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FPGrowthWrapper private (val fpGrowthModel: 
FPGrowthModel) extends MLWritable {
+  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
+  def associationRules: DataFrame = fpGrowthModel.associationRules
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+fpGrowthModel.transform(dataset)
+  }
+
+  override def write: MLWriter = new 
FPGrowthWrapper.FPGrowthWrapperWriter(this)
+}
+
+private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
+
+  def fit(
+   data: DataFrame,
+   minSupport: Double,
+   minConfidence: Double,
+   itemsCol: String,
+   numPartitions: Integer): FPGrowthWrapper = {
+val fpGrowth = new FPGrowth()
+  .setMinSupport(minSupport)
+  .setMinConfidence(minConfidence)
+  .setItemsCol(itemsCol)
+
+if (numPartitions != null && numPartitions > 0) {
--- End diff --

given the earlier suggestion, we should also check numPartition > 0 in R 
before passing to here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r107009471
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,153 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
--- End diff --

can you check if this generate the doc properly
`<\url{http://dx.doi.org/10.1145/1454008.1454027}>`
generally it should be 
`\href{http://...}{Text}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r107009205
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,153 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
--- End diff --

I think we discussed this - let's make it `FP-Growth` or `Frequent Pattern 
Mining` 
(https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html) as 
the title


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17343
  
**[Test build #74893 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74893/testReport)**
 for PR 17343 at commit 
[`5fe279e`](https://github.com/apache/spark/commit/5fe279ea3b5a70206be71ff9a5ebf2175cf0f8d1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17295
  
**[Test build #74906 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74906/testReport)**
 for PR 17295 at commit 
[`1b2a3e4`](https://github.com/apache/spark/commit/1b2a3e48e07118b03e9607bf0dc99cf53ae4d678).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r107009625
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,153 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
+#' PFP distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
--- End diff --

ditto here for url.
In fact, I'm not sure we need to include all the links here but instead 
link to 
https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r107010797
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,153 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
+#' PFP distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param itemsCol Items column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#' @rdname spark.fpGrowth
+#' @name spark.fpGrowth
+#' @aliases spark.fpGrowth,SparkDataFrame-method
+#' @export
+#' @examples
+#' \dontrun{
+#' raw_data <- read.df(
+#'   "data/mllib/sample_fpgrowth.txt",
+#'   source = "csv",
+#'   schema = structType(structField("raw_items", "string")))
+#'
+#' data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
+#' model <- spark.fpGrowth(data)
+#'
+#' # Show frequent itemsets
+#' frequent_itemsets <- spark.freqItemsets(model)
+#' showDF(frequent_itemsets)
+#'
+#' # Show association rules
+#' association_rules <- spark.associationRules(model)
+#' showDF(association_rules)
+#'
+#' # Predict on new data
+#' new_itemsets <- data.frame(items = c("t", "t,s"))
+#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, 
',') as items")
+#' predict(model, new_data)
+#'
+#' # Save and load model
+#' path <- "/path/to/model"
+#' write.ml(model, path)
+#' read.ml(path)
+#'
+#' # Optional arguments
+#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, 
',') as baskets")
+#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 
0.5
+#' itemsCol = "baskets", numPartitions = 
10)
+#' }
+#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
+#' @note spark.fpGrowth since 2.2.0
+setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
+  function(data, minSupport = 0.3, minConfidence = 0.8,
+   itemsCol = "items", numPartitions = -1) {
--- End diff --

`numPartitions` by default is not set in Scala - let's default this to NULL 
instead here
(but do not as.integer if value is NULL - something like
numPartitions <- if (is.null(numPartitions)) NULL else 
as.integer(numPartitions)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17343
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74893/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17343
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 >

201 - 300 of 625 matches

Mail list logo