date:20160707

[GitHub] spark issue #11956: [SPARK-14098][SQL] Generate Java code that gets a float/...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11956
  
**[Test build #61915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61915/consoleFull)**
 for PR 11956 at commit 
[`d035c42`](https://github.com/apache/spark/commit/d035c42db42b6ecbc252b6972419451aabd6e06d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14071: [SPARK-16397][SQL] make CatalogTable more general and le...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14071
  
**[Test build #61914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61914/consoleFull)**
 for PR 14071 at commit 
[`4d65609`](https://github.com/apache/spark/commit/4d65609ae71b2e30cea7b39e1b5a1a9ecfdd2de4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14090: [SPARK-16112][SparkR] Programming guide for gapply/gappl...

2016-07-07 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14090
  
cc @felixcheung @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-07-07 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13778
  
From another point of view, is it necessary to propagate the python UDF 
from python side to jvm side? IIUC the serialization of python UDT happens at 
python side, and the jvm side can only see binary for python data, there is 
nothing we can do at java side. Correct me if I am wrong, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-07-07 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13701
  
@rdblue uh, I see. Thank you for your explanation! My above suggestion is 
to confirm what you said in @viirya test cases. We expect to see the same 
results as what you mentioned.

It sounds like dictionary filtering is available in Parquet 1.9. Really 
look forwarding to it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14028: [SPARK-16351][SQL] Avoid per-record type dispatch...

2016-07-07 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14028#discussion_r69936170
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala
 ---
@@ -17,74 +17,180 @@
 
 package org.apache.spark.sql.execution.datasources.json
 
+import java.io.Writer
+
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
-private[sql] object JacksonGenerator {
-  /** Transforms a single InternalRow to JSON using Jackson
-   *
-   * TODO: make the code shared with the other apply method.
-   *
-   * @param rowSchema the schema object used for conversion
-   * @param gen a JsonGenerator object
-   * @param row The row to convert
-   */
-  def apply(rowSchema: StructType, gen: JsonGenerator)(row: InternalRow): 
Unit = {
-def valWriter: (DataType, Any) => Unit = {
-  case (_, null) | (NullType, _) => gen.writeNull()
-  case (StringType, v) => gen.writeString(v.toString)
-  case (TimestampType, v: Long) => 
gen.writeString(DateTimeUtils.toJavaTimestamp(v).toString)
-  case (IntegerType, v: Int) => gen.writeNumber(v)
-  case (ShortType, v: Short) => gen.writeNumber(v)
-  case (FloatType, v: Float) => gen.writeNumber(v)
-  case (DoubleType, v: Double) => gen.writeNumber(v)
-  case (LongType, v: Long) => gen.writeNumber(v)
-  case (DecimalType(), v: Decimal) => 
gen.writeNumber(v.toJavaBigDecimal)
-  case (ByteType, v: Byte) => gen.writeNumber(v.toInt)
-  case (BinaryType, v: Array[Byte]) => gen.writeBinary(v)
-  case (BooleanType, v: Boolean) => gen.writeBoolean(v)
-  case (DateType, v: Int) => 
gen.writeString(DateTimeUtils.toJavaDate(v).toString)
-  // For UDT values, they should be in the SQL type's corresponding 
value type.
-  // We should not see values in the user-defined class at here.
-  // For example, VectorUDT's SQL type is an array of double. So, we 
should expect that v is
-  // an ArrayData at here, instead of a Vector.
-  case (udt: UserDefinedType[_], v) => valWriter(udt.sqlType, v)
-
-  case (ArrayType(ty, _), v: ArrayData) =>
-gen.writeStartArray()
-v.foreach(ty, (_, value) => valWriter(ty, value))
-gen.writeEndArray()
-
-  case (MapType(kt, vt, _), v: MapData) =>
-gen.writeStartObject()
-v.foreach(kt, vt, { (k, v) =>
-  gen.writeFieldName(k.toString)
-  valWriter(vt, v)
-})
-gen.writeEndObject()
-
-  case (StructType(ty), v: InternalRow) =>
-gen.writeStartObject()
-var i = 0
-while (i < ty.length) {
-  val field = ty(i)
-  val value = v.get(i, field.dataType)
-  if (value != null) {
-gen.writeFieldName(field.name)
-valWriter(field.dataType, value)
-  }
-  i += 1
+private[sql] class JacksonGenerator(schema: StructType, writer: Writer) {
+  // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
+  // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
+  // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
+  private type ValueWriter = (SpecializedGetters, Int) => Unit
+
+  // `ValueWriter`s for all fields of the schema
+  private val rootFieldWriters: Seq[ValueWriter] = 
schema.map(_.dataType).map(makeWriter)
+
+  private val gen = new 
JsonFactory().createGenerator(writer).setRootValueSeparator(null)
+
+  private def makeWriter(dataType: DataType): ValueWriter = dataType match 
{
+case NullType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeNull()
+
+case BooleanType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeBoolean(row.getBoolean(ordinal))
+
+case ByteType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeNumber(row.getByte(ordinal))
+
+case ShortType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeNumber(row.getShort(ordinal))
+
+case IntegerType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeNumber(row.getInt(ordinal))
+
+case LongType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeNumber(row.getLong(ordinal))
+
+case FloatType =>
+  (row:

[GitHub] spark pull request #14028: [SPARK-16351][SQL] Avoid per-record type dispatch...

2016-07-07 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14028#discussion_r69936226
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala
 ---
@@ -17,74 +17,180 @@
 
 package org.apache.spark.sql.execution.datasources.json
 
+import java.io.Writer
+
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
-private[sql] object JacksonGenerator {
-  /** Transforms a single InternalRow to JSON using Jackson
-   *
-   * TODO: make the code shared with the other apply method.
-   *
-   * @param rowSchema the schema object used for conversion
-   * @param gen a JsonGenerator object
-   * @param row The row to convert
-   */
-  def apply(rowSchema: StructType, gen: JsonGenerator)(row: InternalRow): 
Unit = {
-def valWriter: (DataType, Any) => Unit = {
-  case (_, null) | (NullType, _) => gen.writeNull()
-  case (StringType, v) => gen.writeString(v.toString)
-  case (TimestampType, v: Long) => 
gen.writeString(DateTimeUtils.toJavaTimestamp(v).toString)
-  case (IntegerType, v: Int) => gen.writeNumber(v)
-  case (ShortType, v: Short) => gen.writeNumber(v)
-  case (FloatType, v: Float) => gen.writeNumber(v)
-  case (DoubleType, v: Double) => gen.writeNumber(v)
-  case (LongType, v: Long) => gen.writeNumber(v)
-  case (DecimalType(), v: Decimal) => 
gen.writeNumber(v.toJavaBigDecimal)
-  case (ByteType, v: Byte) => gen.writeNumber(v.toInt)
-  case (BinaryType, v: Array[Byte]) => gen.writeBinary(v)
-  case (BooleanType, v: Boolean) => gen.writeBoolean(v)
-  case (DateType, v: Int) => 
gen.writeString(DateTimeUtils.toJavaDate(v).toString)
-  // For UDT values, they should be in the SQL type's corresponding 
value type.
-  // We should not see values in the user-defined class at here.
-  // For example, VectorUDT's SQL type is an array of double. So, we 
should expect that v is
-  // an ArrayData at here, instead of a Vector.
-  case (udt: UserDefinedType[_], v) => valWriter(udt.sqlType, v)
-
-  case (ArrayType(ty, _), v: ArrayData) =>
-gen.writeStartArray()
-v.foreach(ty, (_, value) => valWriter(ty, value))
-gen.writeEndArray()
-
-  case (MapType(kt, vt, _), v: MapData) =>
-gen.writeStartObject()
-v.foreach(kt, vt, { (k, v) =>
-  gen.writeFieldName(k.toString)
-  valWriter(vt, v)
-})
-gen.writeEndObject()
-
-  case (StructType(ty), v: InternalRow) =>
-gen.writeStartObject()
-var i = 0
-while (i < ty.length) {
-  val field = ty(i)
-  val value = v.get(i, field.dataType)
-  if (value != null) {
-gen.writeFieldName(field.name)
-valWriter(field.dataType, value)
-  }
-  i += 1
+private[sql] class JacksonGenerator(schema: StructType, writer: Writer) {
+  // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
+  // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
+  // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
+  private type ValueWriter = (SpecializedGetters, Int) => Unit
+
+  // `ValueWriter`s for all fields of the schema
+  private val rootFieldWriters: Seq[ValueWriter] = 
schema.map(_.dataType).map(makeWriter)
--- End diff --

Let's use an array. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14028: [SPARK-16351][SQL] Avoid per-record type dispatch...

2016-07-07 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14028#discussion_r69936163
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala
 ---
@@ -17,74 +17,180 @@
 
 package org.apache.spark.sql.execution.datasources.json
 
+import java.io.Writer
+
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
-private[sql] object JacksonGenerator {
-  /** Transforms a single InternalRow to JSON using Jackson
-   *
-   * TODO: make the code shared with the other apply method.
-   *
-   * @param rowSchema the schema object used for conversion
-   * @param gen a JsonGenerator object
-   * @param row The row to convert
-   */
-  def apply(rowSchema: StructType, gen: JsonGenerator)(row: InternalRow): 
Unit = {
-def valWriter: (DataType, Any) => Unit = {
-  case (_, null) | (NullType, _) => gen.writeNull()
-  case (StringType, v) => gen.writeString(v.toString)
-  case (TimestampType, v: Long) => 
gen.writeString(DateTimeUtils.toJavaTimestamp(v).toString)
-  case (IntegerType, v: Int) => gen.writeNumber(v)
-  case (ShortType, v: Short) => gen.writeNumber(v)
-  case (FloatType, v: Float) => gen.writeNumber(v)
-  case (DoubleType, v: Double) => gen.writeNumber(v)
-  case (LongType, v: Long) => gen.writeNumber(v)
-  case (DecimalType(), v: Decimal) => 
gen.writeNumber(v.toJavaBigDecimal)
-  case (ByteType, v: Byte) => gen.writeNumber(v.toInt)
-  case (BinaryType, v: Array[Byte]) => gen.writeBinary(v)
-  case (BooleanType, v: Boolean) => gen.writeBoolean(v)
-  case (DateType, v: Int) => 
gen.writeString(DateTimeUtils.toJavaDate(v).toString)
-  // For UDT values, they should be in the SQL type's corresponding 
value type.
-  // We should not see values in the user-defined class at here.
-  // For example, VectorUDT's SQL type is an array of double. So, we 
should expect that v is
-  // an ArrayData at here, instead of a Vector.
-  case (udt: UserDefinedType[_], v) => valWriter(udt.sqlType, v)
-
-  case (ArrayType(ty, _), v: ArrayData) =>
-gen.writeStartArray()
-v.foreach(ty, (_, value) => valWriter(ty, value))
-gen.writeEndArray()
-
-  case (MapType(kt, vt, _), v: MapData) =>
-gen.writeStartObject()
-v.foreach(kt, vt, { (k, v) =>
-  gen.writeFieldName(k.toString)
-  valWriter(vt, v)
-})
-gen.writeEndObject()
-
-  case (StructType(ty), v: InternalRow) =>
-gen.writeStartObject()
-var i = 0
-while (i < ty.length) {
-  val field = ty(i)
-  val value = v.get(i, field.dataType)
-  if (value != null) {
-gen.writeFieldName(field.name)
-valWriter(field.dataType, value)
-  }
-  i += 1
+private[sql] class JacksonGenerator(schema: StructType, writer: Writer) {
+  // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
+  // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
+  // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
+  private type ValueWriter = (SpecializedGetters, Int) => Unit
+
+  // `ValueWriter`s for all fields of the schema
+  private val rootFieldWriters: Seq[ValueWriter] = 
schema.map(_.dataType).map(makeWriter)
+
+  private val gen = new 
JsonFactory().createGenerator(writer).setRootValueSeparator(null)
+
+  private def makeWriter(dataType: DataType): ValueWriter = dataType match 
{
+case NullType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeNull()
+
+case BooleanType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeBoolean(row.getBoolean(ordinal))
+
+case ByteType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeNumber(row.getByte(ordinal))
+
+case ShortType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeNumber(row.getShort(ordinal))
+
+case IntegerType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeNumber(row.getInt(ordinal))
+
+case LongType =>
+  (row: SpecializedGetters, ordinal: Int) =>
+gen.writeNumber(row.getLong(ordinal))
+
+case FloatType =>
+  (row:

[GitHub] spark issue #14071: [SPARK-16397][SQL] make CatalogTable more general and le...

2016-07-07 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14071
  
cc @yhuai @gatorsmile @liancheng @clockfly 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-07-07 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/13701
  
@gatorsmile, we've not seen a penalty from running row group level tests 
when no row groups are filtered and we've decided to turn on dictionary 
filtering by default. You may see a penalty from using Parquet's internal 
record-level filter rather than a codegened filter. My recommendation (which is 
being discussed on the Parquet list) is to add the ability to filter row groups 
without turning on record-level filters. Should be easy and would solve your 
problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...

2016-07-07 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14077
  
@JustinPihony How about you first moving the `copy` function in your PR 
now? Then, we can review your PR before the SPARK-16401 is resolved. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-07-07 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13701
  
@viirya Maybe you have not read my discussion with @rdblue . @rdblue 
already explained how Parquet internally works. Like what I said above, I think 
we still need a test for confirming whether there does not exist a noticeable 
extra penalty when no row is filtered out. Could you do a quick check based on 
your existing test? Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69933267
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,145 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = children(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = children(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

Good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14052: [SPARK-15440] [Core] [Deploy] Add CSRF Filter for...

2016-07-07 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/14052#discussion_r69933155
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
@@ -93,6 +94,14 @@ private[spark] abstract class RestSubmissionServer(
 contextToServlet.foreach { case (prefix, servlet) =>
   mainHandler.addServlet(new ServletHolder(servlet), prefix)
 }
+if(masterConf.getBoolean("spark.rest.csrf.enable", false)) {
--- End diff --

I'm not familiar with this but the question is should the config be 
documented?  I assume this is something you want the end user to have the 
option of using?  

if this is only being used by the private Rest apis should it be true by 
default or what exactly are the ramifications of that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69932808
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,145 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
--- End diff --

OK, I'll fix this, thank you,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-07-07 Thread vlad17

Github user vlad17 commented on the issue:

https://github.com/apache/spark/pull/13778
  
LGTM +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69932567
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
--- End diff --

Since check it here is at Excutor side, it will be of no difference with 
current implementation. I think the point is whether we can assume that in 
almost all cases `part` is `Literal`.


---
If your project is set up for it, you can reply to this email and have your
reply

[GitHub] spark issue #14092: [SPARK-16419][SQL] EnsureRequirements adds extra Sort to...

2016-07-07 Thread MasterDDT

Github user MasterDDT commented on the issue:

https://github.com/apache/spark/pull/14092
  
cc @JoshRosen @rxin 

I wasn't sure if the right fix here is that `Expression` should override 
`equals` and use `semanticEquals`, that would be a bigger change but I think 
would work. Also I noticed the `EquivalentExpressions` class but that seemed 
like it was only for codegen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14092: [SPARK-16419][SQL] EnsureRequirements adds extra Sort to...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14092
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` optimiz...

2016-07-07 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13765
  
Under what circumstances will a user use 2 or more adjacent re-partitioning 
operators?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` ...

2016-07-07 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13765#discussion_r69930648
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -537,12 +537,19 @@ object CollapseProject extends Rule[LogicalPlan] {
 }
 
 /**
- * Combines adjacent [[Repartition]] operators by keeping only the last 
one.
+ * Combines adjacent [[Repartition]] and [[RepartitionByExpression]] 
operator combinations
+ * by keeping only the one.
--- End diff --

> ... by only keeping the top-level one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14092: [SPARK-16419][SQL] EnsureRequirements adds extra ...

2016-07-07 Thread MasterDDT

GitHub user MasterDDT opened a pull request:

https://github.com/apache/spark/pull/14092

[SPARK-16419][SQL] EnsureRequirements adds extra Sort to already sorted 
cached table

## What changes were proposed in this pull request?

EnsureRequirements compares the required and given sort ordering, but uses 
Scala equals instead of a semantic equals, so column capitalization isn't 
considered, and also fails for a cached table. This results in a SortMergeJoin 
of a cached already-sorted table to add an extra sort.

Using semanticEquals to do the compare instead of scala equals on 2 
`Seq[SortOrder]`

## How was this patch tested?

Added 3 tests, the last 2 tests break without the fix.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ActionIQ/spark SPARK-16419

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14092.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14092


commit ab48ca6464f1c05cf58b4d0e0f1b7e617fdcb5fb
Author: MasterDDT 
Date:   2016-07-05T20:51:24Z

Add tests

commit 372466bf4eda8aa7f8a7319ce682df9cdd61d666
Author: MasterDDT 
Date:   2016-07-06T17:36:04Z

Add tests

commit b4b02bf3879daf9a4532b61a019ea33b0f3ff835
Author: MasterDDT 
Date:   2016-07-07T15:30:58Z

Add fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13765: [SPARK-16052][SQL] Improve `CollapseRepartition` ...

2016-07-07 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13765#discussion_r69930213
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
@@ -370,8 +370,11 @@ package object dsl {
 case plan => SubqueryAlias(alias, plan)
   }
 
-  def distribute(exprs: Expression*): LogicalPlan =
-RepartitionByExpression(exprs, logicalPlan)
+  def repartition(num: Integer): LogicalPlan =
+Repartition(num, shuffle = true, logicalPlan)
+
+  def distribute(exprs: Expression*)(n: Int = -1): LogicalPlan =
+RepartitionByExpression(exprs, logicalPlan, numPartitions = if (n 
< 0) None else Some(n))
--- End diff --

Seems that adding a `distribute(n: Int, exprs: Expression*)` overloading 
method is simpler?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14065: [SPARK-14743][YARN][WIP] Add a configurable token manage...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14065
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14065: [SPARK-14743][YARN][WIP] Add a configurable token manage...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14065
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61912/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14065: [SPARK-14743][YARN][WIP] Add a configurable token manage...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14065
  
**[Test build #61912 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61912/consoleFull)**
 for PR 14065 at commit 
[`c9d9ed0`](https://github.com/apache/spark/commit/c9d9ed0cd0aef6b8017c04e635f27ef123a48887).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69928758
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
--- End diff --

Can we add a boolean field to indicate whether `part` is foldable and check 
it here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69928094
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,145 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = children(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = children(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

The need for verifying the behavior in Scala REPL probably indicates that 
we should check for null explicitly to make it more readabie.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14088: [SPARK-16414] [YARN] Fix bugs for "Can not get user conf...

2016-07-07 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/14088
  
Can you please fix the description?  "Fix bugs for "Can not get user config 
when calling SparkHadoopUtil.get.conf in other places"." doesn't make sense to 
me.  Where exactly is SparkHadoopUtil being instantiated before the 
ApplicationMaster?What cases does this apply to  (client mode, cluster 
mode, etc..)  How do I reproduce?

Also you say existing unit tests cover this, was one failing because of 
this?  If not perhaps we should add one.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-07 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69927398
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import scala.util.Random
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  1. replace ignore(...) with test(...)
+ *  2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  def calculateHeaderPortionInBytes(count: Int) : Int = {
+// Use this assignment for SPARK-15962
+// val size = 4 + 4 * count
+val size = UnsafeArrayData.calculateHeaderPortionInBytes(count)
+size
+  }
+
+  def readUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+val rand = new Random(42)
+
+var intResult: Int = 0
+val intBuffer = Array.fill[Int](count) { rand.nextInt }
+val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind()
+val intInternalRow = intEncoder.toRow(intBuffer)
+val intUnsafeArray = intInternalRow.getArray(0)
+val readIntArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = intUnsafeArray.numElements
+var sum = 0.toInt
+var i = 0
+while (i < len) {
+  sum += intUnsafeArray.getInt(i)
+  i += 1
+}
+intResult = sum
+n += 1
+  }
+}
+
+var doubleResult: Double = 0
+val doubleBuffer = Array.fill[Double](count) { rand.nextDouble }
+val doubleEncoder = ExpressionEncoder[Array[Double]].resolveAndBind()
+val doubleInternalRow = doubleEncoder.toRow(doubleBuffer)
+val doubleUnsafeArray = doubleInternalRow.getArray(0)
+val readDoubleArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = doubleUnsafeArray.numElements
+var sum = 0.toDouble
+var i = 0
+while (i < len) {
+  sum += doubleUnsafeArray.getDouble(i)
+  i += 1
+}
+doubleResult = sum
+n += 1
+  }
+}
+
+val benchmark = new Benchmark("Read UnsafeArrayData", count * iters)
+benchmark.addCase("Int")(readIntArray)
+benchmark.addCase("Double")(readDoubleArray)
+benchmark.run
+/*
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.10.4
+Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
+
+Read UnsafeArrayData:Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
+

+Int279 /  294600.4 
  1.7   1.0X
+Double 296 /  303567.0 
  1.8   0.9X
+*/
+  }
+
+  def writeUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+
+val intUnsafeRow = new UnsafeRow(1)
+val intUnsafeArrayWriter = new UnsafeArrayWriter
--- End diff --

Got it. My interpretation was to use an `UnsafeArray` generated by 
`encoder.toRow(array)` for benchmark. I will update `writeUnsafeArray` to 
measure the elapsed time of `encoder.toRow(array)`


---
If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69927073
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,145 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
--- End diff --

We should probably `.stripMargin` here:

```scala
"""...
  |...
""".stripMargin
```

Otherwise all leading white spaces are included in the extended description 
string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...

2016-07-07 Thread JustinPihony

Github user JustinPihony commented on the issue:

https://github.com/apache/spark/pull/14077
  
Thanks. I will have to wait until SPARK-16401 is resolved or else the code 
will not pass tests, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14065: [SPARK-14743][YARN][WIP] Add a configurable token manage...

2016-07-07 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/14065
  
I took a quick look through.

It might be nice to think about how we could handle other credentials.

For instance Apache Kafka currently doesn't have tokens so you need keytab 
or TGT and jaas conf file. Yes they are adding tokens but in in the mean time 
how does that work.  Are there other services similar to that.

Can we handle things other then Tokens?  it does appear that I could 
implement my own ServiceTokenProvider that goes off to really any service and I 
can put things into the Credentials object as Token or as a Secret so perhaps 
we are covered here.  But perhaps that means we should rename things to be 
obtainCredentials rather then obtainTokens.

Are there specific services you were thinking about here? We could atleast 
use those as examples to make sure interface fits those.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13123: [SPARK-15422] [Core] Remove unnecessary calculation of s...

2016-07-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13123
  
I think this change is made by another PR #13677. We can close it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...

2016-07-07 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/14065#discussion_r69918095
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -390,8 +390,9 @@ private[spark] class Client(
 // Upload Spark and the application JAR to the remote file system if 
necessary,
 // and add them as local resources to the application master.
 val fs = destDir.getFileSystem(hadoopConf)
-val nns = YarnSparkHadoopUtil.get.getNameNodesToAccess(sparkConf) + 
destDir
-YarnSparkHadoopUtil.get.obtainTokensForNamenodes(nns, hadoopConf, 
credentials)
+hdfsTokenProvider(sparkConf).setNameNodesToAccess(sparkConf, 
Set(destDir))
--- End diff --

it seems a bit odd to me that we are doing these extra things for hdfs 
outside of the token provider.

what happens if a user needs to do something similar for some other service 
they want to implement their own class to handle?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...

2016-07-07 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/14065#discussion_r69917965
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/token/HDFSTokenProvider.scala 
---
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn.token
+
+import java.io.{ByteArrayInputStream, DataInputStream}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier
+import org.apache.hadoop.mapred.Master
+import org.apache.hadoop.security.Credentials
+import org.apache.hadoop.security.token.Token
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+
+private[yarn] class HDFSTokenProvider
+  extends ServiceTokenProvider with ServiceTokenRenewable with Logging {
+
+  private var nnsToAccess: Set[Path] = Set.empty
+  private var tokenRenewer: Option[String] = None
+
+  override val  serviceName: String = "hdfs"
+
+  override def obtainTokensFromService(
+  sparkConf: SparkConf,
+  serviceConf: Configuration,
+  creds: Credentials)
+: Array[Token[_]] = {
+val tokens = ArrayBuffer[Token[_]]()
+val renewer = tokenRenewer.getOrElse(getTokenRenewer(serviceConf))
+nnsToAccess.foreach { dst =>
+  val dstFs = dst.getFileSystem(serviceConf)
+  logInfo("getting token for namenode: " + dst)
+  tokens  ++= dstFs.addDelegationTokens(renewer, creds)
+}
+
+tokens.toArray
+  }
+
+  override def getTokenRenewalInterval(sparkConf: SparkConf, serviceConf: 
Configuration): Long = {
+// We cannot use the tokens generated above since those have renewer 
yarn. Trying to renew
--- End diff --

update comment to be more specific then "above" since this has moved


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-07-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13701
  
@yhuai BTW, when reading more row groups, the performance improvement is 
much more.

Before this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 3.19.0-25-generic
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Parquet reader:  Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


reading Parquet file  1416 / 2168  1.4  
   691.3   1.0X

After this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 
3.19.0-25-generic
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Parquet reader:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


reading Parquet file   246 /  334  8.3  
   120.3   1.0X



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14065: [SPARK-14743][YARN][WIP] Add a configurable token...

2016-07-07 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/14065#discussion_r69916415
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/token/AMDelegationTokenRenewer.scala
 ---
@@ -171,10 +174,9 @@ private[yarn] class AMDelegationTokenRenewer(
 keytabLoggedInUGI.doAs(new PrivilegedExceptionAction[Void] {
   // Get a copy of the credentials
   override def run(): Void = {
-val nns = YarnSparkHadoopUtil.get.getNameNodesToAccess(sparkConf) 
+ dst
-hadoopUtil.obtainTokensForNamenodes(nns, freshHadoopConf, 
tempCreds)
-hadoopUtil.obtainTokenForHiveMetastore(sparkConf, freshHadoopConf, 
tempCreds)
-hadoopUtil.obtainTokenForHBase(sparkConf, freshHadoopConf, 
tempCreds)
+hdfsTokenProvider(sparkConf).setNameNodesToAccess(sparkConf, 
Set(dst))
+hdfsTokenProvider(sparkConf).setTokenRenewer(null)
--- End diff --

why are we setting this to null here?  is it supposed to indicate that its 
not supposed to be renewed internally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab

2016-07-07 Thread nblintao

Github user nblintao commented on the issue:

https://github.com/apache/spark/pull/13620
  
I believe this commit has resolved the bugs reported by @ajbozarth. It 
looks well on history server pages now, and it could keep the status of other 
tables while changing one.
Could you please help test or review it if you are available? Thanks! 
@andrewor14 @zsxwing @ajbozarth 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14091: [SPARK-16412][SQL] Generate Java code that gets an array...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14091
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14091: [SPARK-16412][SQL] Generate Java code that gets an array...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14091
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61913/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14091: [SPARK-16412][SQL] Generate Java code that gets an array...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14091
  
**[Test build #61913 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61913/consoleFull)**
 for PR 14091 at commit 
[`54df41c`](https://github.com/apache/spark/commit/54df41c8691f02dd9eac3eef3d816a130b87a5c9).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14091: [SPARK-16412][SQL] Generate Java code that gets an array...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14091
  
**[Test build #61913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61913/consoleFull)**
 for PR 14091 at commit 
[`54df41c`](https://github.com/apache/spark/commit/54df41c8691f02dd9eac3eef3d816a130b87a5c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14091: [SPARK-16412][SQL] Generate Java code that gets a...

2016-07-07 Thread kiszk

GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/14091

[SPARK-16412][SQL] Generate Java code that gets an array in each column of 
CachedBatch when DataFrame.cache() is called

## What changes were proposed in this pull request?

Waiting #11956 to be merged.

This PR generates Java code to directly get an array of each column from 
CachedBatch when DataFrame.cache() is called. This is done in whole stage code 
generation.

When DataFrame.cache() is called, data is stored as column-oriented storage 
(columnar cache) in CachedBatch. This PR avoid conversion from column-oriented 
storage to row-oriented storage. This PR handles an array type that is stored 
into a column. 

This PR generates code both for row-oriented storage and column-oriented 
storage only if
 - InMemoryColumnarTableScan exists in a plan sub-tree. A decision is 
performed by checking an given iterator is ColumnaIterator at runtime
 - Sort or join does not exist in a plan sub-tree. 

This PR generates Java code for columnar cache only if types in all 
columns, which are accessed in operations, are primitive or an array

I will add benchmark suites into  
[here](https://github.com/kiszk/spark/blob/SPARK-14098/sql/core/src/test/scala/org/apache/spark/sql/DataFrameCacheBenchmark.scala)



## How was this patch tested?

Added new tests into `DataFrameCacheSuite.scala`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-16412

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14091


commit 09af5a5851786b918f45c6f997b1c357745fe883
Author: Kazuaki Ishizaki 
Date:   2016-07-07T10:36:14Z

support codegen for an array in CachedBatch

commit 8e218e38d5acb6c04db221fcd3cd6d2483926552
Author: Kazuaki Ishizaki 
Date:   2016-07-07T10:36:34Z

update test suites

commit 54df41c8691f02dd9eac3eef3d816a130b87a5c9
Author: Kazuaki Ishizaki 
Date:   2016-07-07T13:18:58Z

remove debug print




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13620
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13620
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61910/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13620
  
**[Test build #61910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61910/consoleFull)**
 for PR 13620 at commit 
[`649bb19`](https://github.com/apache/spark/commit/649bb195ac0eaf6cee4a84dd8ff1198900e8789a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14089: [SPARK-16415][SQL] fix catalog string error

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14089
  
**[Test build #61909 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61909/consoleFull)**
 for PR 14089 at commit 
[`eb10181`](https://github.com/apache/spark/commit/eb1018108a879a06701c3dff539ef8d10ab2b118).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14089: [SPARK-16415][SQL] fix catalog string error

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14089
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-07 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r69910685
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -125,8 +125,11 @@ private[spark] abstract class YarnSchedulerBackend(
* This includes executors already pending or running.
*/
   override def doRequestTotalExecutors(requestedTotal: Int): Boolean = {
+
+val nodeBlacklist: Set[String] = 
scheduler.blacklistTracker.nodeBlacklist()
--- End diff --

this is safe to call without a lock on the task scheduler because the 
nodeBlacklist is stored in an AtomicReference, which is set whenever the node 
blacklist changes.  We could just get a lock on the task scheduler here -- but 
I felt that it would just be harder to ensure correctness (not just in this 
change, but making sure the right lock was always held through future changes), 
and the only real cost is that we duplicate the set of nodes in the blacklist, 
which is hopefully very small.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14090: [SPARK-16112][SparkR] Programming guide for gapply/gappl...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14090
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14090: [SPARK-16112][SparkR] Programming guide for gapply/gappl...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14090
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61911/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14090: [SPARK-16112][SparkR] Programming guide for gapply/gappl...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14090
  
**[Test build #61911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61911/consoleFull)**
 for PR 14090 at commit 
[`7781d1c`](https://github.com/apache/spark/commit/7781d1c111f38e3608d5ebd468e6d344d52efa5c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11956: [SPARK-14098][SQL] Generate Java code that gets a float/...

2016-07-07 Thread a-roberts

Github user a-roberts commented on the issue:

https://github.com/apache/spark/pull/11956
  
@robbinspg and I are evaluating this from a functional and performance 
perspective, full disclosure: we both work for IBM with @kiszk.

All unit tests pass including the new ones Ishizaki has added, we've tested 
this on a variety of platforms, both big and little-endian. This is with IBM 
Java 8 and tested on three different architectures.

We can run the benchmark with
```
bin/spark-submit --class org.apache.spark.sql.DataFrameCacheBenchmark 
sql/core/target/spark-sql_2.11-2.0.0-tests.jar
``` 

or can be run against branch-2.0 (Spark 2.0.1 snapshot) with 
```
bin/spark-submit --class org.apache.spark.sql.DataFrameCacheBenchmark 
sql/core/target/spark-sql_2.11-2.0.1-SNAPSHOT-tests.jar
```

Performance results on a few low powered testing systems are promising.

Linux on Intel: 5.3x increase
```
  Stopped after 15 iterations, 2127 ms

IBM J9 VM pxa6480sr3-20160428_01 (SR3) on Linux 3.13.0-65-generic
Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Float Sum with PassThrough cache:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


InternalRow codegen669 /  829 47.1  
21.3   1.0X
ColumnVector codegen   127 /  142248.2  
 4.0   5.3X
```

Linux on Z: 2.7x increase
```
Stopped after 5 iterations, 2068 ms

IBM J9 VM pxz6480sr3-20160428_01 (SR3) on Linux 3.12.43-52.6-default
16/07/07 09:48:15 ERROR Utils: Process List(/usr/bin/grep, -m, 1, model 
name, /proc/cpuinfo) exited with code 1:
Unknown processor
Float Sum with PassThrough cache:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


InternalRow codegen997 / 1134 31.5  
31.7   1.0X
ColumnVector codegen   371 /  414 84.7  
11.8   2.7X

```

Linux on Power: 6.4x increase
```
  Stopped after 7 iterations, 2099 ms

IBM J9 VM pxl6480sr3-20160428_01 (SR3) on Linux 3.13.0-61-generic
16/07/07 14:33:40 ERROR Utils: Process List(/bin/grep, -m, 1, model name, 
/proc/cpuinfo) exited with code 1:
Unknown processor
Float Sum with PassThrough cache:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


InternalRow codegen   1199 / 1212 26.2  
38.1   1.0X
ColumnVector codegen   186 /  300168.8  
 5.9   6.4X
```

So the performance increase and functionality is solid across platforms, 
Ishizaki has tested this with OpenJDK 8 also.

One improvement would be add a scale factor parameter so we can use more 
data than:
```
doubleSumBenchmark(1024 * 1024 * 15)
floatSumBenchmark(1024 * 1024 * 30)
```
and with no parameter we'd use the above as a standard/baseline. 

Would also be useful to have the master url as a parameter so we can easily 
run this using many machines or with more cores to see the 
performance/functional impact when we scale (exercising various JIT levels for 
example)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14065: [SPARK-14743][YARN][WIP] Add a configurable token manage...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14065
  
**[Test build #61912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61912/consoleFull)**
 for PR 14065 at commit 
[`c9d9ed0`](https://github.com/apache/spark/commit/c9d9ed0cd0aef6b8017c04e635f27ef123a48887).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14089: [SPARK-16415][SQL] fix catalog string error

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14089
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61909/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14090: [SPARK-16112][SparkR] Programming guide for gapply/gappl...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14090
  
**[Test build #61911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61911/consoleFull)**
 for PR 14090 at commit 
[`7781d1c`](https://github.com/apache/spark/commit/7781d1c111f38e3608d5ebd468e6d344d52efa5c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...

2016-07-07 Thread NarineK

GitHub user NarineK opened a pull request:

https://github.com/apache/spark/pull/14090

[SPARK-16112][SparkR] Programming guide for gapply/gapplyCollect

## What changes were proposed in this pull request?

Updates programming guide for spark.gapply/spark.gapplyCollect.

Similar to other examples I used faithful dataset to demonstrate gapply's 
functionality.
Please, let me know if you prefer another example.

## How was this patch tested?
Existing test cases in R

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NarineK/spark gapplyProgGuide

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14090


commit 29d8a5c6c22202cdf7d6cc44f1d6cbeca5946918
Author: Narine Kokhlikyan 
Date:   2016-06-20T22:12:11Z

Fixed duplicated documentation problem + separated documentation for dapply 
and dapplyCollect

commit 698c4331d2a8bfe7f4b372ebc8123b6c27a57e68
Author: Narine Kokhlikyan 
Date:   2016-06-23T18:51:48Z

merge with master

commit 85a4493a03b3601a93c25ebc1eafb2868efec8d8
Author: Narine Kokhlikyan 
Date:   2016-07-07T13:18:49Z

Adding programming guide for gapply/gapplyCollect

commit 7781d1c111f38e3608d5ebd468e6d344d52efa5c
Author: Narine Kokhlikyan 
Date:   2016-07-07T13:27:35Z

removing output format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69905666
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,145 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = children(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = children(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

Test in 2.10.5 too. The result is the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor port rest...

2016-07-07 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/11157
  
WIP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor po...

2016-07-07 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/11157#discussion_r69898133
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
 ---
@@ -356,4 +374,233 @@ private[mesos] trait MesosSchedulerUtils extends 
Logging {
 
sc.conf.getTimeAsSeconds("spark.mesos.rejectOfferDurationForReachedMaxCores", 
"120s")
   }
 
+  /**
+   * Checks executor ports if they are within some range of the offered 
list of ports ranges,
+   *
+   * @param sc the Spark Context
+   * @param ports the list of ports to check
+   * @return true if ports are within range false otherwise
+   */
+  protected def checkPorts(sc: SparkContext, ports: List[(Long, Long)]): 
Boolean = {
+
+def checkIfInRange(port: Long, ps: List[(Long, Long)]): Boolean = {
+  ps.exists(r => r._1 <= port & r._2 >= port)
+}
+
+val portsToCheck = ManagedPorts.getPortValues(sc.conf)
+val nonZeroPorts = portsToCheck.filter(_ != 0)
+val withinRange = nonZeroPorts.forall(p => checkIfInRange(p, ports))
+// make sure we have enough ports to allocate per offer
+ports.map(r => r._2 - r._1 + 1).sum >= portsToCheck.size && withinRange
+  }
+
+  /**
+   * Partitions port resources.
+   *
+   * @param conf the spark config
+   * @param ports the ports offered
+   * @return resources left, port resources to be used and the list of 
assigned ports
+   */
+  def partitionPorts(
+  conf: SparkConf,
--- End diff --

Ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69895008
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
--- End diff --

It depends on how many users will call `parse_url` in this way. Personally 
I think using literal `part` is more natural. But we need more thoughts here, 
cc @liancheng @clockfly 


---
If your project is set up for it, you can reply to this email and have your
reply

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69894466
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
--- End diff --

I've though this again, row level check seems inevitable. Since we can not 
limit `part` to be a `Literal`.
eg. `select parse_url(url, part) from url_data`.
Thus, throw `AnalysisException` here seems not suitable.
How do you think?


---
If your project is set up

[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13620
  
**[Test build #61910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61910/consoleFull)**
 for PR 13620 at commit 
[`649bb19`](https://github.com/apache/spark/commit/649bb195ac0eaf6cee4a84dd8ff1198900e8789a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69893880
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
--- End diff --

yup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-07 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r69893800
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -310,12 +342,38 @@ private[spark] class TaskSchedulerImpl(
   }
 }
 
-// Randomly shuffle offers to avoid always placing tasks on the same 
set of workers.
-val shuffledOffers = Random.shuffle(offers)
+// ensure that we periodically check if executors can be removed from 
the blacklist, without
+// requiring a separate thread and added synchronization overhead
+blacklistTracker.expireExecutorsInBlacklist()
+
+val sortedTaskSets = rootPool.getSortedTaskSetQueue
+val filteredOffers: IndexedSeq[WorkerOffer] = offers.filter { offer =>
+  !blacklistTracker.isNodeBlacklisted(offer.host)  &&
+!blacklistTracker.isExecutorBlacklisted(offer.executorId)
+} match {
+// toIndexedSeq always makes an *immutable* IndexedSeq, though we 
don't care if its mutable
+// or immutable.  So we do this to avoid making a pointless copy
+  case is: IndexedSeq[WorkerOffer] => is
+  case other: Seq[WorkerOffer] => other.toIndexedSeq
+}
--- End diff --

this business about `IndexedSeq[WorkerOffer]` vs `Seq[WorkerOffer]` is also 
unrelated to blacklisting, but I ran into it accidentally while doing some 
performance tests.  While `resourceOffer` accepts a `Seq`, it really ought to 
be an `IndexedSeq` given how its used internally (eg. given 500 offers, there 
is a 5x performance difference in the scheduler).  I made this change just 
because its more locally contained ... alternatively we could change the method 
signature and the callsites appropriately.

It *happens* to be an IndexedSeq in the [important callsite in 
CoarseGrainedSchedulerBackend](https://github.com/apache/spark/blob/a04cab8f17fcac05f86d2c472558ab98923f91e3/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L217),
 but that is more by chance than design (a call to `.toSeq` just happens to 
return an `IndexedSeq` for the particular types used)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-07 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69893790
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,145 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = children(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = children(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

can you also try scala 2.10? It's also an officially supported scala 
version for spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions

2016-07-07 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14004
  
and one comment for the old thread: 
https://github.com/apache/spark/pull/14004#discussion_r69893328


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...

2016-07-07 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14004#discussion_r69893328
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -198,6 +203,66 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   override def prettyName: String = "split"
 }
 
+/**
+ * Splits a string into arrays of sentences, where each sentence is an 
array of words.
+ * The 'lang' and 'country' arguments are optional, and if omitted, the 
default locale is used.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, lang, country) - Splits str into an array of array 
of words.",
+  extended = "> SELECT _FUNC_('Hi there! Good morning.');\n  
[['Hi','there'], ['Good','morning']]")
+case class Sentences(
+str: Expression,
+language: Expression = Literal(""),
+country: Expression = Literal(""))
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  def this(str: Expression) = this(str, Literal(""), Literal(""))
+  def this(str: Expression, language: Expression) = this(str, language, 
Literal(""))
+
+  override def nullable: Boolean = true
+  override def dataType: DataType =
+ArrayType(ArrayType(StringType, containsNull = false), containsNull = 
false)
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType, StringType)
+  override def children: Seq[Expression] = str :: language :: country :: 
Nil
+
+  override def eval(input: InternalRow): Any = {
+val string = str.eval(input)
+if (string == null) {
+  null
+} else {
+  val locale = try {
+new Locale(language.eval(input).asInstanceOf[UTF8String].toString,
+  country.eval(input).asInstanceOf[UTF8String].toString)
+  } catch {
+case _: NullPointerException | _: ClassCastException => 
Locale.getDefault
--- End diff --

what do you mean by `ignored`? returning null or returning the default 
locale?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...

2016-07-07 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14004#discussion_r69893166
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -198,6 +203,67 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   override def prettyName: String = "split"
 }
 
+/**
+ * Splits a string into arrays of sentences, where each sentence is an 
array of words.
+ * The 'lang' and 'country' arguments are optional, and if omitted, the 
default locale is used.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, lang, country) - Splits str into an array of array 
of words.",
+  extended = "> SELECT _FUNC_('Hi there! Good morning.');\n  
[['Hi','there'], ['Good','morning']]")
+case class Sentences(
+str: Expression,
+language: Expression = Literal(""),
+country: Expression = Literal(""))
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  def this(str: Expression) = this(str, Literal(""), Literal(""))
+  def this(str: Expression, language: Expression) = this(str, language, 
Literal(""))
+
+  override def nullable: Boolean = true
+  override def dataType: DataType =
+ArrayType(ArrayType(StringType, containsNull = false), containsNull = 
false)
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType, StringType)
+  override def children: Seq[Expression] = str :: language :: country :: 
Nil
+
+  override def eval(input: InternalRow): Any = {
+val string = str.eval(input)
+if (string == null) {
+  null
+} else {
+  var locale = Locale.getDefault
+  val lang = language.eval(input)
+  val coun = country.eval(input)
+  if (lang != null && coun != null) {
--- End diff --

I'd like to write:
```
val languageStr = language.eval(input).asInstanceOf[UTF8String]
val countryStr = country.eval(input).asInstanceOf[UTF8String]
val locale = if (languageStr != null && countryStr != null) {
  new Locale(languageStr, countryStr)
} else {
  Locale.getDefault
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer...

2016-07-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13876


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer to rem...

2016-07-07 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13876
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14089: [SPARK-16415][SQL] fix catalog string error

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14089
  
**[Test build #61909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61909/consoleFull)**
 for PR 14089 at commit 
[`eb10181`](https://github.com/apache/spark/commit/eb1018108a879a06701c3dff539ef8d10ab2b118).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13894: [SPARK-15254][DOC] Improve ML pipeline Cross Vali...

2016-07-07 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/13894#discussion_r69892610
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -56,7 +56,10 @@ private[ml] trait CrossValidatorParams extends 
ValidatorParams {
 
 /**
  * :: Experimental ::
- * K-fold cross validation.
+ * CrossValidator begins by splitting the dataset into a set of 
non-overlapping randomly
+ * partitioned folds as separate training and test datasets e.g., with k=3 
folds,
--- End diff --

I think we can bring back the "folds, which are used as ..." part


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14089: [SPARK-16415][SQL] fix catalog string error

2016-07-07 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/14089

[SPARK-16415][SQL] fix catalog string error

## What changes were proposed in this pull request?

In #13537 we truncate `simpleString` if it is a long `StructType`. But 
sometimes we need `catalogString` to reconstruct `TypeInfo`, for example in 
description of [SPARK-16415 
](https://issues.apache.org/jira/browse/SPARK-16415). So we need to keep the 
implementation of `catalogString` not affected by our truncate.


## How was this patch tested?

added a test case.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark catalogstring

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14089


commit dc81b385a68c842715b3377f15f7b3009e45f0ce
Author: Daoyuan Wang 
Date:   2016-07-06T03:18:07Z

fix catalog string

commit eb1018108a879a06701c3dff539ef8d10ab2b118
Author: Daoyuan Wang 
Date:   2016-07-07T11:41:39Z

add a unit test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14088: [SPARK-16414] [YARN] Fix bugs for "Can not get user conf...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14088
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14088: Fix bugs for "Can not get user config when callin...

2016-07-07 Thread sharkdtu

GitHub user sharkdtu opened a pull request:

https://github.com/apache/spark/pull/14088

Fix bugs for "Can not get user config when calling SparkHadoopUtil.get.conf 
in other places"

## What changes were proposed in this pull request?

Fix bugs for "Can not get user config when calling SparkHadoopUtil.get.conf 
in other places".

The `SparkHadoopUtil` singleton was instantiated before 
`ApplicationMaster`, So the `sparkConf` and `conf` in the `SparkHadoopUtil` 
singleton didn't include user's configuration. But other places, such as 
`DataSourceStrategy`, use `hadoopConf` in `SparkHadoopUtil`:

```scala
...
case PhysicalOperation(projects, filters, l @ LogicalRelation(t: 
HadoopFsRelation, _)) =>
  // See buildPartitionedTableScan for the reason that we need to 
create a shard
  // broadcast HadoopConf.
  val sharedHadoopConf = SparkHadoopUtil.get.conf
  val confBroadcast =
t.sqlContext.sparkContext.broadcast(new 
SerializableConfiguration(sharedHadoopConf))
...
``` 

## How was this patch tested?

Use exist test cases

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sharkdtu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14088.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14088


commit 55e66b21cdcd68861db0f1045186048c54b13153
Author: sharkdtu 
Date:   2016-07-07T11:04:11Z

Fix bugs for "Can not get user config when calling SparkHadoopUtil.get.conf 
in other places, such as DataSourceStrategy"




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61908/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14087
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14087
  
**[Test build #61908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61908/consoleFull)**
 for PR 14087 at commit 
[`ac82232`](https://github.com/apache/spark/commit/ac822323f35122b99c6aa4d9fce5874160266909).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14086: [SPARK-16410][SQL] Support `truncate` option in Overwrit...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61907/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14086: [SPARK-16410][SQL] Support `truncate` option in Overwrit...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14086
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14086: [SPARK-16410][SQL] Support `truncate` option in Overwrit...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14086
  
**[Test build #61907 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61907/consoleFull)**
 for PR 14086 at commit 
[`c1e4c41`](https://github.com/apache/spark/commit/c1e4c411c04458622a09c010feb8a8a5204f89c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14016: [SPARK-16399] [PYSPARK] Force PYSPARK_PYTHON to p...

2016-07-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14016


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14016: [SPARK-16399] [PYSPARK] Force PYSPARK_PYTHON to python

2016-07-07 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14016
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix should aw...

2016-07-07 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14049
  
@yinxusen if you resolve the conflicts I'll merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of...

2016-07-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14051


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of RowMat...

2016-07-07 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14051
  
Merged to master/2.0/1.6. I think it's a reasonably important bug fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13494
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61906/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13494
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13494
  
**[Test build #61906 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61906/consoleFull)**
 for PR 13494 at commit 
[`67211be`](https://github.com/apache/spark/commit/67211beb80c4d84fb70c6037cc53044f86f094d5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14076: [SPARK-16400][SQL] Remove InSet filter pushdown f...

2016-07-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14076


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14076: [SPARK-16400][SQL] Remove InSet filter pushdown from Par...

2016-07-07 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14076
  
LGTM. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-07-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13778
  
ping @cloud-fan @vlad17 Any thing else?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13778
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61903/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13778
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13778
  
**[Test build #61903 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61903/consoleFull)**
 for PR 13778 at commit 
[`87a0953`](https://github.com/apache/spark/commit/87a0953ec36d6beacb4665a94da834d0a4615baa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13701
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61904/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-07-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13701
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-07-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13701
  
**[Test build #61904 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61904/consoleFull)**
 for PR 13701 at commit 
[`687d75b`](https://github.com/apache/spark/commit/687d75b2e12d45107600037955e8afca63128094).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

401 - 500 of 591 matches

Mail list logo