[jira] [Commented] (SPARK-17057) ProbabilisticClassifierModels' prediction more reasonable with multi zero thresholds

2016-08-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420632#comment-15420632
 ] 

Apache Spark commented on SPARK-17057:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/14643

> ProbabilisticClassifierModels' prediction more reasonable with multi zero 
> thresholds
> 
>
> Key: SPARK-17057
> URL: https://issues.apache.org/jira/browse/SPARK-17057
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: zhengruifeng
>
> {code}
> val path = "./data/mllib/sample_multiclass_classification_data.txt"
> val data = spark.read.format("libsvm").load(path)
> val rfm = rf.fit(data)
> scala> rfm.setThresholds(Array(0.0,0.0,0.0))
> res4: org.apache.spark.ml.classification.RandomForestClassificationModel = 
> RandomForestClassificationModel (uid=rfc_cbe640b0eccc) with 20 trees
> scala> rfm.transform(data).show(5)
> +-++--+-+--+
> |label|features| rawPrediction|  probability|prediction|
> +-++--+-+--+
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  0.0|(4,[0,1,2,3],[0.1...|[20.0,0.0,0.0]|[1.0,0.0,0.0]|   0.0|
> +-++--+-+--+
> only showing top 5 rows
> {code}
> If multi thresholds are set zero, the prediction of 
> {{ProbabilisticClassificationModel}} is the first index whose corresponding 
> threshold is 0. 
> However, in this case, the index with max {{probability}} among indices with 
> 0-threshold should be more reasonable to mark as
> {{prediction}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17057) ProbabilisticClassifierModels' prediction more reasonable with multi zero thresholds

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17057:


Assignee: (was: Apache Spark)

> ProbabilisticClassifierModels' prediction more reasonable with multi zero 
> thresholds
> 
>
> Key: SPARK-17057
> URL: https://issues.apache.org/jira/browse/SPARK-17057
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: zhengruifeng
>
> {code}
> val path = "./data/mllib/sample_multiclass_classification_data.txt"
> val data = spark.read.format("libsvm").load(path)
> val rfm = rf.fit(data)
> scala> rfm.setThresholds(Array(0.0,0.0,0.0))
> res4: org.apache.spark.ml.classification.RandomForestClassificationModel = 
> RandomForestClassificationModel (uid=rfc_cbe640b0eccc) with 20 trees
> scala> rfm.transform(data).show(5)
> +-++--+-+--+
> |label|features| rawPrediction|  probability|prediction|
> +-++--+-+--+
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  0.0|(4,[0,1,2,3],[0.1...|[20.0,0.0,0.0]|[1.0,0.0,0.0]|   0.0|
> +-++--+-+--+
> only showing top 5 rows
> {code}
> If multi thresholds are set zero, the prediction of 
> {{ProbabilisticClassificationModel}} is the first index whose corresponding 
> threshold is 0. 
> However, in this case, the index with max {{probability}} among indices with 
> 0-threshold should be more reasonable to mark as
> {{prediction}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17057) ProbabilisticClassifierModels' prediction more reasonable with multi zero thresholds

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17057:


Assignee: Apache Spark

> ProbabilisticClassifierModels' prediction more reasonable with multi zero 
> thresholds
> 
>
> Key: SPARK-17057
> URL: https://issues.apache.org/jira/browse/SPARK-17057
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: zhengruifeng
>Assignee: Apache Spark
>
> {code}
> val path = "./data/mllib/sample_multiclass_classification_data.txt"
> val data = spark.read.format("libsvm").load(path)
> val rfm = rf.fit(data)
> scala> rfm.setThresholds(Array(0.0,0.0,0.0))
> res4: org.apache.spark.ml.classification.RandomForestClassificationModel = 
> RandomForestClassificationModel (uid=rfc_cbe640b0eccc) with 20 trees
> scala> rfm.transform(data).show(5)
> +-++--+-+--+
> |label|features| rawPrediction|  probability|prediction|
> +-++--+-+--+
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
> |  0.0|(4,[0,1,2,3],[0.1...|[20.0,0.0,0.0]|[1.0,0.0,0.0]|   0.0|
> +-++--+-+--+
> only showing top 5 rows
> {code}
> If multi thresholds are set zero, the prediction of 
> {{ProbabilisticClassificationModel}} is the first index whose corresponding 
> threshold is 0. 
> However, in this case, the index with max {{probability}} among indices with 
> 0-threshold should be more reasonable to mark as
> {{prediction}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17057) ProbabilisticClassifierModels' prediction more reasonable with multi zero thresholds

2016-08-14 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-17057:


 Summary: ProbabilisticClassifierModels' prediction more reasonable 
with multi zero thresholds
 Key: SPARK-17057
 URL: https://issues.apache.org/jira/browse/SPARK-17057
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: zhengruifeng


{code}
val path = "./data/mllib/sample_multiclass_classification_data.txt"
val data = spark.read.format("libsvm").load(path)
val rfm = rf.fit(data)

scala> rfm.setThresholds(Array(0.0,0.0,0.0))
res4: org.apache.spark.ml.classification.RandomForestClassificationModel = 
RandomForestClassificationModel (uid=rfc_cbe640b0eccc) with 20 trees

scala> rfm.transform(data).show(5)
+-++--+-+--+
|label|features| rawPrediction|  probability|prediction|
+-++--+-+--+
|  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
|  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
|  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
|  1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]|   0.0|
|  0.0|(4,[0,1,2,3],[0.1...|[20.0,0.0,0.0]|[1.0,0.0,0.0]|   0.0|
+-++--+-+--+
only showing top 5 rows
{code}

If multi thresholds are set zero, the prediction of 
{{ProbabilisticClassificationModel}} is the first index whose corresponding 
threshold is 0. 
However, in this case, the index with max {{probability}} among indices with 
0-threshold should be more reasonable to mark as
{{prediction}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17056) Fix a wrong assert in MemoryStore

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17056:


Assignee: (was: Apache Spark)

> Fix a wrong assert in MemoryStore
> -
>
> Key: SPARK-17056
> URL: https://issues.apache.org/jira/browse/SPARK-17056
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> There is an assert in MemoryStore's putIteratorAsValues method which is used 
> to check if unroll memory is not released too much. This assert looks wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17056) Fix a wrong assert in MemoryStore

2016-08-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420627#comment-15420627
 ] 

Apache Spark commented on SPARK-17056:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/14642

> Fix a wrong assert in MemoryStore
> -
>
> Key: SPARK-17056
> URL: https://issues.apache.org/jira/browse/SPARK-17056
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> There is an assert in MemoryStore's putIteratorAsValues method which is used 
> to check if unroll memory is not released too much. This assert looks wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17056) Fix a wrong assert in MemoryStore

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17056:


Assignee: Apache Spark

> Fix a wrong assert in MemoryStore
> -
>
> Key: SPARK-17056
> URL: https://issues.apache.org/jira/browse/SPARK-17056
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>Priority: Minor
>
> There is an assert in MemoryStore's putIteratorAsValues method which is used 
> to check if unroll memory is not released too much. This assert looks wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17056) Fix a wrong assert in MemoryStore

2016-08-14 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-17056:
---

 Summary: Fix a wrong assert in MemoryStore
 Key: SPARK-17056
 URL: https://issues.apache.org/jira/browse/SPARK-17056
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Liang-Chi Hsieh
Priority: Minor


There is an assert in MemoryStore's putIteratorAsValues method which is used to 
check if unroll memory is not released too much. This assert looks wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17055) add labelKFold to CrossValidator

2016-08-14 Thread Vincent (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent updated SPARK-17055:

Description: 
Current CrossValidator only supports k-fold, which randomly divides all the 
samples in k groups of samples. But in cases when data is gathered from 
different subjects and we want to avoid over-fitting, we want to hold out 
samples with certain labels from training data and put them into validation 
fold, i.e. we want to ensure that the same label is not in both testing and 
training sets.

Mainstream packages like Sklearn already supports such cross validation method. 
(http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LabelKFold.html#sklearn.cross_validation.LabelKFold)

  was:
Current CrossValidator only supports k-fold, which randomly divides all the 
samples in k groups of samples. But in cases when data is gathered from 
different subjects and we want to avoid over-fitting, we want to hold out 
samples with certain labels from training data and put them into validation 
fold, i.e. we want to ensure that the same label is not in both testing and 
training sets.

Mainstream package like Sklearn already supports such cross validation method. 


> add labelKFold to CrossValidator
> 
>
> Key: SPARK-17055
> URL: https://issues.apache.org/jira/browse/SPARK-17055
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Vincent
>Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the 
> samples in k groups of samples. But in cases when data is gathered from 
> different subjects and we want to avoid over-fitting, we want to hold out 
> samples with certain labels from training data and put them into validation 
> fold, i.e. we want to ensure that the same label is not in both testing and 
> training sets.
> Mainstream packages like Sklearn already supports such cross validation 
> method. 
> (http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LabelKFold.html#sklearn.cross_validation.LabelKFold)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17055) add labelKFold to CrossValidator

2016-08-14 Thread Vincent (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent updated SPARK-17055:

Affects Version/s: (was: 2.0.0)

> add labelKFold to CrossValidator
> 
>
> Key: SPARK-17055
> URL: https://issues.apache.org/jira/browse/SPARK-17055
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Vincent
>Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the 
> samples in k groups of samples. But in cases when data is gathered from 
> different subjects and we want to avoid over-fitting, we want to hold out 
> samples with certain labels from training data and put them into validation 
> fold, i.e. we want to ensure that the same label is not in both testing and 
> training sets.
> Mainstream package like Sklearn already supports such cross validation 
> method. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17055) add labelKFold to CrossValidator

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17055:


Assignee: Apache Spark

> add labelKFold to CrossValidator
> 
>
> Key: SPARK-17055
> URL: https://issues.apache.org/jira/browse/SPARK-17055
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Vincent
>Assignee: Apache Spark
>Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the 
> samples in k groups of samples. But in cases when data is gathered from 
> different subjects and we want to avoid over-fitting, we want to hold out 
> samples with certain labels from training data and put them into validation 
> fold, i.e. we want to ensure that the same label is not in both testing and 
> training sets.
> Mainstream package like Sklearn already supports such cross validation 
> method. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17055) add labelKFold to CrossValidator

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17055:


Assignee: (was: Apache Spark)

> add labelKFold to CrossValidator
> 
>
> Key: SPARK-17055
> URL: https://issues.apache.org/jira/browse/SPARK-17055
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Vincent
>Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the 
> samples in k groups of samples. But in cases when data is gathered from 
> different subjects and we want to avoid over-fitting, we want to hold out 
> samples with certain labels from training data and put them into validation 
> fold, i.e. we want to ensure that the same label is not in both testing and 
> training sets.
> Mainstream package like Sklearn already supports such cross validation 
> method. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17055) add labelKFold to CrossValidator

2016-08-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420569#comment-15420569
 ] 

Apache Spark commented on SPARK-17055:
--

User 'VinceShieh' has created a pull request for this issue:
https://github.com/apache/spark/pull/14640

> add labelKFold to CrossValidator
> 
>
> Key: SPARK-17055
> URL: https://issues.apache.org/jira/browse/SPARK-17055
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Vincent
>Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the 
> samples in k groups of samples. But in cases when data is gathered from 
> different subjects and we want to avoid over-fitting, we want to hold out 
> samples with certain labels from training data and put them into validation 
> fold, i.e. we want to ensure that the same label is not in both testing and 
> training sets.
> Mainstream package like Sklearn already supports such cross validation 
> method. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17055) add labelKFold to CrossValidator

2016-08-14 Thread Vincent (JIRA)
Vincent created SPARK-17055:
---

 Summary: add labelKFold to CrossValidator
 Key: SPARK-17055
 URL: https://issues.apache.org/jira/browse/SPARK-17055
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 2.0.0
Reporter: Vincent
Priority: Minor


Current CrossValidator only supports k-fold, which randomly divides all the 
samples in k groups of samples. But in cases when data is gathered from 
different subjects and we want to avoid over-fitting, we want to hold out 
samples with certain labels from training data and put them into validation 
fold, i.e. we want to ensure that the same label is not in both testing and 
training sets.

Mainstream package like Sklearn already supports such cross validation method. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6235) Address various 2G limits

2016-08-14 Thread Guoqiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420554#comment-15420554
 ] 

Guoqiang Li edited comment on SPARK-6235 at 8/15/16 1:53 AM:
-

[~hvanhovell]
The main changes.

1. Replace DiskStore method {{def getBytes (blockId: BlockId): 
ChunkedByteBuffer}} to {{def getBlockData(blockId: BlockId): ManagedBuffer}}.

2. ManagedBuffer's nioByteBuffer method return ChunkedByteBuffer.

3. Add Class {{ChunkFetchInputStream}}, used for flow control and code as 
follows:

{noformat}

package org.apache.spark.network.client;

import java.io.IOException;
import java.io.InputStream;
import java.nio.channels.ClosedChannelException;
import java.util.Iterator;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicReference;

import com.google.common.primitives.UnsignedBytes;
import io.netty.buffer.ByteBuf;
import io.netty.channel.Channel;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.apache.spark.network.buffer.ChunkedByteBuffer;
import org.apache.spark.network.buffer.ManagedBuffer;
import org.apache.spark.network.protocol.StreamChunkId;
import org.apache.spark.network.util.LimitedInputStream;
import org.apache.spark.network.util.TransportFrameDecoder;

public class ChunkFetchInputStream extends InputStream {
  private final Logger logger = 
LoggerFactory.getLogger(ChunkFetchInputStream.class);

  private final TransportResponseHandler handler;
  private final Channel channel;
  private final StreamChunkId streamId;
  private final long byteCount;
  private final ChunkReceivedCallback callback;
  private final LinkedBlockingQueue buffers = new 
LinkedBlockingQueue<>(1024);
  public final TransportFrameDecoder.Interceptor interceptor;

  private ByteBuf curChunk;
  private boolean isCallbacked = false;
  private long writerIndex = 0;

  private final AtomicReference cause = new AtomicReference<>(null);
  private final AtomicBoolean isClosed = new AtomicBoolean(false);

  public ChunkFetchInputStream(
  TransportResponseHandler handler,
  Channel channel,
  StreamChunkId streamId,
  long byteCount,
  ChunkReceivedCallback callback) {
this.handler = handler;
this.channel = channel;
this.streamId = streamId;
this.byteCount = byteCount;
this.callback = callback;
this.interceptor = new StreamInterceptor();
  }

  @Override
  public int read() throws IOException {
if (isClosed.get()) return -1;
pullChunk();
if (curChunk != null) {
  byte b = curChunk.readByte();
  return UnsignedBytes.toInt(b);
} else {
  return -1;
}
  }

  @Override
  public int read(byte[] dest, int offset, int length) throws IOException {
if (isClosed.get()) return -1;
pullChunk();
if (curChunk != null) {
  int amountToGet = Math.min(curChunk.readableBytes(), length);
  curChunk.readBytes(dest, offset, amountToGet);
  return amountToGet;
} else {
  return -1;
}
  }

  @Override
  public long skip(long bytes) throws IOException {
if (isClosed.get()) return 0L;
pullChunk();
if (curChunk != null) {
  int amountToSkip = (int) Math.min(bytes, curChunk.readableBytes());
  curChunk.skipBytes(amountToSkip);
  return amountToSkip;
} else {
  return 0L;
}
  }

  @Override
  public void close() throws IOException {
if (!isClosed.get()) {
  releaseCurChunk();
  isClosed.set(true);
  resetChannel();
  Iterator itr = buffers.iterator();
  while (itr.hasNext()) {
itr.next().release();
  }
  buffers.clear();
}
  }

  private void pullChunk() throws IOException {
if (curChunk != null && !curChunk.isReadable()) releaseCurChunk();
if (curChunk == null && cause.get() == null && !isClosed.get()) {
  try {
curChunk = buffers.take();
// if channel.read() will be not invoked automatically,
// the method is called by here
if (!channel.config().isAutoRead()) channel.read();
  } catch (Throwable e) {
setCause(e);
  }
}
if (cause.get() != null) throw new IOException(cause.get());
  }

  private void setCause(Throwable e) {
if (cause.get() == null) cause.set(e);
  }

  private void releaseCurChunk() {
if (curChunk != null) {
  curChunk.release();
  curChunk = null;
}
  }

  private void onSuccess() throws IOException {
if (isCallbacked) return;
if (cause.get() != null) {
  callback.onFailure(streamId.chunkIndex, cause.get());
} else {
  InputStream inputStream = new LimitedInputStream(this, byteCount);
  ManagedBuffer managedBuffer = new InputStreamManagedBuffer(inputStream, 
byteCount);
  callback.onSuccess(streamId.chunkIndex, managedBuffer);
}
isCallbacked = true;
  }

  private void resetChannel() {
if 

[jira] [Commented] (SPARK-6235) Address various 2G limits

2016-08-14 Thread Guoqiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420554#comment-15420554
 ] 

Guoqiang Li commented on SPARK-6235:


[~hvanhovell]
The main changes.

1. Replace DiskStore method {{def getBytes (blockId: BlockId): 
ChunkedByteBuffer}} to {{def getBlockData(blockId: BlockId): ManagedBuffer}}.

2. ManagedBuffer's nioByteBuffer method return ChunkedByteBuffer.

3. Add Class Chunk Fetch InputStream, used for flow control and code as follows:

{noformat}

package org.apache.spark.network.client;

import java.io.IOException;
import java.io.InputStream;
import java.nio.channels.ClosedChannelException;
import java.util.Iterator;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicReference;

import com.google.common.primitives.UnsignedBytes;
import io.netty.buffer.ByteBuf;
import io.netty.channel.Channel;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.apache.spark.network.buffer.ChunkedByteBuffer;
import org.apache.spark.network.buffer.ManagedBuffer;
import org.apache.spark.network.protocol.StreamChunkId;
import org.apache.spark.network.util.LimitedInputStream;
import org.apache.spark.network.util.TransportFrameDecoder;

public class ChunkFetchInputStream extends InputStream {
  private final Logger logger = 
LoggerFactory.getLogger(ChunkFetchInputStream.class);

  private final TransportResponseHandler handler;
  private final Channel channel;
  private final StreamChunkId streamId;
  private final long byteCount;
  private final ChunkReceivedCallback callback;
  private final LinkedBlockingQueue buffers = new 
LinkedBlockingQueue<>(1024);
  public final TransportFrameDecoder.Interceptor interceptor;

  private ByteBuf curChunk;
  private boolean isCallbacked = false;
  private long writerIndex = 0;

  private final AtomicReference cause = new AtomicReference<>(null);
  private final AtomicBoolean isClosed = new AtomicBoolean(false);

  public ChunkFetchInputStream(
  TransportResponseHandler handler,
  Channel channel,
  StreamChunkId streamId,
  long byteCount,
  ChunkReceivedCallback callback) {
this.handler = handler;
this.channel = channel;
this.streamId = streamId;
this.byteCount = byteCount;
this.callback = callback;
this.interceptor = new StreamInterceptor();
  }

  @Override
  public int read() throws IOException {
if (isClosed.get()) return -1;
pullChunk();
if (curChunk != null) {
  byte b = curChunk.readByte();
  return UnsignedBytes.toInt(b);
} else {
  return -1;
}
  }

  @Override
  public int read(byte[] dest, int offset, int length) throws IOException {
if (isClosed.get()) return -1;
pullChunk();
if (curChunk != null) {
  int amountToGet = Math.min(curChunk.readableBytes(), length);
  curChunk.readBytes(dest, offset, amountToGet);
  return amountToGet;
} else {
  return -1;
}
  }

  @Override
  public long skip(long bytes) throws IOException {
if (isClosed.get()) return 0L;
pullChunk();
if (curChunk != null) {
  int amountToSkip = (int) Math.min(bytes, curChunk.readableBytes());
  curChunk.skipBytes(amountToSkip);
  return amountToSkip;
} else {
  return 0L;
}
  }

  @Override
  public void close() throws IOException {
if (!isClosed.get()) {
  releaseCurChunk();
  isClosed.set(true);
  resetChannel();
  Iterator itr = buffers.iterator();
  while (itr.hasNext()) {
itr.next().release();
  }
  buffers.clear();
}
  }

  private void pullChunk() throws IOException {
if (curChunk != null && !curChunk.isReadable()) releaseCurChunk();
if (curChunk == null && cause.get() == null && !isClosed.get()) {
  try {
curChunk = buffers.take();
// if channel.read() will be not invoked automatically,
// the method is called by here
if (!channel.config().isAutoRead()) channel.read();
  } catch (Throwable e) {
setCause(e);
  }
}
if (cause.get() != null) throw new IOException(cause.get());
  }

  private void setCause(Throwable e) {
if (cause.get() == null) cause.set(e);
  }

  private void releaseCurChunk() {
if (curChunk != null) {
  curChunk.release();
  curChunk = null;
}
  }

  private void onSuccess() throws IOException {
if (isCallbacked) return;
if (cause.get() != null) {
  callback.onFailure(streamId.chunkIndex, cause.get());
} else {
  InputStream inputStream = new LimitedInputStream(this, byteCount);
  ManagedBuffer managedBuffer = new InputStreamManagedBuffer(inputStream, 
byteCount);
  callback.onSuccess(streamId.chunkIndex, managedBuffer);
}
isCallbacked = true;
  }

  private void resetChannel() {
if (!channel.config().isAutoRead()) {
  

[jira] [Commented] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os

2016-08-14 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420553#comment-15420553
 ] 

Jeff Zhang commented on SPARK-17054:


Although I can fix it by using the correct cache dir for mac OS, I am confused 
why we need to download sparkR. I don't remember it is needed in spark 1.x.  Is 
this expected behavior ? [~shivaram] [~junyangq]

> SparkR can not run in yarn-cluster mode on mac os
> -
>
> Key: SPARK-17054
> URL: https://issues.apache.org/jira/browse/SPARK-17054
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>
> This is due to it download sparkR to the wrong place.
> {noformat}
> Warning message:
> 'sparkR.init' is deprecated.
> Use 'sparkR.session' instead.
> See help("Deprecated")
> Spark not found in SPARK_HOME:  .
> To search in the cache directory. Installation will start if not found.
> Mirror site not provided.
> Looking for site suggested from apache website...
> Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark
> Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
> - 
> http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
> Fetch failed from http://apache.mirror.cdnetworks.com/spark
>  open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', 
> reason 'No such file or directory'>
> To use backup site...
> Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
> - 
> http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
> Fetch failed from http://www-us.apache.org/dist/spark
>  open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', 
> reason 'No such file or directory'>
> Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName,  
> :
>   Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network 
> connection, Hadoop version, or provide other mirror sites.
> Calls: sparkRSQL.init ... sparkR.session -> install.spark -> 
> robust_download_tar
> In addition: Warning messages:
> 1: 'sparkRSQL.init' is deprecated.
> Use 'sparkR.session' instead.
> See help("Deprecated")
> 2: In dir.create(localDir, recursive = TRUE) :
>   cannot create dir '/home//Library', reason 'Operation not supported'
> Execution halted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os

2016-08-14 Thread Jeff Zhang (JIRA)
Jeff Zhang created SPARK-17054:
--

 Summary: SparkR can not run in yarn-cluster mode on mac os
 Key: SPARK-17054
 URL: https://issues.apache.org/jira/browse/SPARK-17054
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.0.0
Reporter: Jeff Zhang


This is due to it download sparkR to the wrong place.
{noformat}
Warning message:
'sparkR.init' is deprecated.
Use 'sparkR.session' instead.
See help("Deprecated")
Spark not found in SPARK_HOME:  .
To search in the cache directory. Installation will start if not found.
Mirror site not provided.
Looking for site suggested from apache website...
Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark
Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
- 
http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
Fetch failed from http://apache.mirror.cdnetworks.com/spark

To use backup site...
Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
- http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
Fetch failed from http://www-us.apache.org/dist/spark

Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName,  :
  Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network 
connection, Hadoop version, or provide other mirror sites.
Calls: sparkRSQL.init ... sparkR.session -> install.spark -> robust_download_tar
In addition: Warning messages:
1: 'sparkRSQL.init' is deprecated.
Use 'sparkR.session' instead.
See help("Deprecated")
2: In dir.create(localDir, recursive = TRUE) :
  cannot create dir '/home//Library', reason 'Operation not supported'
Execution halted
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16781) java launched by PySpark as gateway may not be the same java used in the spark environment

2016-08-14 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420547#comment-15420547
 ] 

Jeff Zhang commented on SPARK-16781:


JAVA_HOME will be set by yarn, not sure about other cluster managers. 

> java launched by PySpark as gateway may not be the same java used in the 
> spark environment
> --
>
> Key: SPARK-16781
> URL: https://issues.apache.org/jira/browse/SPARK-16781
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.2
>Reporter: Michael Berman
>
> When launching spark on a system with multiple javas installed, there are a 
> few options for choosing which JRE to use, setting `JAVA_HOME` being the most 
> straightforward.
> However, when pyspark's internal py4j launches its JavaGateway, it always 
> invokes `java` directly, without qualification. This means you get whatever 
> java's first on your path, which is not necessarily the same one in spark's 
> JAVA_HOME.
> This could be seen as a py4j issue, but from their point of view, the fix is 
> easy: make sure the java you want is first on your path. I can't figure out a 
> way to make that reliably happen through the pyspark executor launch path, 
> and it seems like something that would ideally happen automatically. If I set 
> JAVA_HOME when launching spark, I would expect that to be the only java used 
> throughout the stack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14165) NoSuchElementException: None.get when joining DataFrames with Seq of fields of different case

2016-08-14 Thread Jacek Laskowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420541#comment-15420541
 ] 

Jacek Laskowski commented on SPARK-14165:
-

Thanks [~dongjoon] for looking into it. Yes, the more general join works fine. 
I think the other example where the join expression is given should be fixed.

> NoSuchElementException: None.get when joining DataFrames with Seq of fields 
> of different case
> -
>
> Key: SPARK-14165
> URL: https://issues.apache.org/jira/browse/SPARK-14165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {code}
> scala> val left = Seq((1,"a")).toDF("id", "abc")
> left: org.apache.spark.sql.DataFrame = [id: int, abc: string]
> scala> val right = Seq((1,"a")).toDF("id", "ABC")
> right: org.apache.spark.sql.DataFrame = [id: int, ABC: string]
> scala> left.join(right, Seq("abc"))
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$commonNaturalJoinProcessing(Analyzer.scala:1444)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1426)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1418)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:67)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:57)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1418)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1417)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:41)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:41)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:58)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2299)
>   at org.apache.spark.sql.Dataset.join(Dataset.scala:553)
>   at org.apache.spark.sql.Dataset.join(Dataset.scala:526)
>   ... 51 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17035) Conversion of datetime.max to microseconds produces incorrect value

2016-08-14 Thread Michael Styles (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420534#comment-15420534
 ] 

Michael Styles commented on SPARK-17035:


I have a fix for this issue if you would like to assign the problem to me.

On Sat, Aug 13, 2016 at 5:31 PM, Dongjoon Hyun (JIRA) 




-- 
Michael Styles
Senior Data Platform Engineer Lead
Shopify


> Conversion of datetime.max to microseconds produces incorrect value
> ---
>
> Key: SPARK-17035
> URL: https://issues.apache.org/jira/browse/SPARK-17035
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Michael Styles
>Priority: Minor
>
> Conversion of datetime.max to microseconds produces incorrect value. For 
> example,
> {noformat}
> from datetime import datetime
> from pyspark.sql import Row
> from pyspark.sql.types import StructType, StructField, TimestampType
> schema = StructType([StructField("dt", TimestampType(), False)])
> data = [{"dt": datetime.max}]
> # convert python objects to sql data
> sql_data = [schema.toInternal(row) for row in data]
> # Value is wrong.
> sql_data
> [(2.534023188e+17,)]
> {noformat}
> This value should be [(2534023187,)].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14165) NoSuchElementException: None.get when joining DataFrames with Seq of fields of different case

2016-08-14 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420527#comment-15420527
 ] 

Dongjoon Hyun commented on SPARK-14165:
---

I'm wondering if we need to fix the example in your comment. This is still the 
same.
{code}
scala> left.join(right, $"abc" === $"ABC")
org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could be: 
abc#6, abc#16.;
{code}

> NoSuchElementException: None.get when joining DataFrames with Seq of fields 
> of different case
> -
>
> Key: SPARK-14165
> URL: https://issues.apache.org/jira/browse/SPARK-14165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {code}
> scala> val left = Seq((1,"a")).toDF("id", "abc")
> left: org.apache.spark.sql.DataFrame = [id: int, abc: string]
> scala> val right = Seq((1,"a")).toDF("id", "ABC")
> right: org.apache.spark.sql.DataFrame = [id: int, ABC: string]
> scala> left.join(right, Seq("abc"))
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$commonNaturalJoinProcessing(Analyzer.scala:1444)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1426)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1418)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:67)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:57)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1418)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1417)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:41)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:41)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:58)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2299)
>   at org.apache.spark.sql.Dataset.join(Dataset.scala:553)
>   at org.apache.spark.sql.Dataset.join(Dataset.scala:526)
>   ... 51 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14165) NoSuchElementException: None.get when joining DataFrames with Seq of fields of different case

2016-08-14 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420525#comment-15420525
 ] 

Dongjoon Hyun commented on SPARK-14165:
---

Hi, [~ja...@japila.pl].

Spark 2.0 seems to be released without this problem.

{code}
scala> val left = Seq((1,"a")).toDF("id", "abc")
scala> val right = Seq((1,"a")).toDF("id", "ABC")
scala> left.join(right, Seq("abc")).show
+---+---+---+
|abc| id| id|
+---+---+---+
|  a|  1|  1|
+---+---+---+
scala> spark.version
res1: String = 2.0.0
{code}

Could you confirm this?

> NoSuchElementException: None.get when joining DataFrames with Seq of fields 
> of different case
> -
>
> Key: SPARK-14165
> URL: https://issues.apache.org/jira/browse/SPARK-14165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {code}
> scala> val left = Seq((1,"a")).toDF("id", "abc")
> left: org.apache.spark.sql.DataFrame = [id: int, abc: string]
> scala> val right = Seq((1,"a")).toDF("id", "ABC")
> right: org.apache.spark.sql.DataFrame = [id: int, ABC: string]
> scala> left.join(right, Seq("abc"))
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$commonNaturalJoinProcessing(Analyzer.scala:1444)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1426)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1418)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:67)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:57)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1418)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1417)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:41)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:41)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:58)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2299)
>   at org.apache.spark.sql.Dataset.join(Dataset.scala:553)
>   at org.apache.spark.sql.Dataset.join(Dataset.scala:526)
>   ... 51 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option

2016-08-14 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420515#comment-15420515
 ] 

Dongjoon Hyun commented on SPARK-17053:
---

Yep. I closed the PR, too. Thank you for the quick decision, [~rxin].

> Spark ignores hive.exec.drop.ignorenonexistent=true option
> --
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option

2016-08-14 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin closed SPARK-17053.
---
Resolution: Won't Fix

> Spark ignores hive.exec.drop.ignorenonexistent=true option
> --
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option

2016-08-14 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420507#comment-15420507
 ] 

Reynold Xin commented on SPARK-17053:
-

It was by accident that this was supported, because Spark simply used Hive's 
code to do all DDL operations. In Spark 2.0, Spark implemented all the DDLs 
natively in Spark, and it is not the intention of the project to support all 
the esoteric features provided by Hive.


> Spark ignores hive.exec.drop.ignorenonexistent=true option
> --
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option

2016-08-14 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420507#comment-15420507
 ] 

Reynold Xin edited comment on SPARK-17053 at 8/14/16 10:32 PM:
---

It was by accident that this was supported, because Spark simply used Hive's 
code to do all DDL operations. In Spark 2.0, Spark implemented all the DDLs 
natively in Spark, and it is not the intention of the project to support all 
the esoteric feature flags provided by Hive.



was (Author: rxin):
It was by accident that this was supported, because Spark simply used Hive's 
code to do all DDL operations. In Spark 2.0, Spark implemented all the DDLs 
natively in Spark, and it is not the intention of the project to support all 
the esoteric features provided by Hive.


> Spark ignores hive.exec.drop.ignorenonexistent=true option
> --
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-08-14 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420503#comment-15420503
 ] 

Dongjoon Hyun commented on SPARK-11374:
---

Hi [~stephane.maa...@gmail.com],

Thank you for comments. Yep. I noticed that option too, but that seems more 
tricky. 

The current approach of Spark Scala API and my PR is checking if the 
partition's file start position is zero. So, it's not straight-forward to apply 
to footer option.

For this issue, I think it could be acceptable since Spark Scala API already 
supports `header` option.

However, for the `footer` option, I think we need a new JIRA issue to get some 
attention and to build consensus for that option.

Thanks,
Dongjoon.

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-08-14 Thread Stephane Maarek (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420491#comment-15420491
 ] 

Stephane Maarek commented on SPARK-11374:
-

Hi,

Thanks for the PR. Can you also test for the footer option? Might as well
solve both issues

Thanks
Stéphane




> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-08-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420489#comment-15420489
 ] 

Apache Spark commented on SPARK-11374:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/14638

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11374:


Assignee: (was: Apache Spark)

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11374:


Assignee: Apache Spark

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>Assignee: Apache Spark
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2243) Support multiple SparkContexts in the same JVM

2016-08-14 Thread Stephen Boesch (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420467#comment-15420467
 ] 

Stephen Boesch commented on SPARK-2243:
---

Given this were not going to be fixed: please update documentation and fix the 
following warning:

WARN SparkContext: Multiple running SparkContexts detected 
in the same JVM!
org.apache.spark.SparkException: Only one SparkContext may be running in
this JVM (see SPARK-2243). To ignore this error, 
set spark.driver.allowMultipleContexts = true



> Support multiple SparkContexts in the same JVM
> --
>
> Key: SPARK-2243
> URL: https://issues.apache.org/jira/browse/SPARK-2243
> Project: Spark
>  Issue Type: New Feature
>  Components: Block Manager, Spark Core
>Affects Versions: 0.7.0, 1.0.0, 1.1.0
>Reporter: Miguel Angel Fernandez Diaz
>
> We're developing a platform where we create several Spark contexts for 
> carrying out different calculations. Is there any restriction when using 
> several Spark contexts? We have two contexts, one for Spark calculations and 
> another one for Spark Streaming jobs. The next error arises when we first 
> execute a Spark calculation and, once the execution is finished, a Spark 
> Streaming job is launched:
> {code}
> 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0
> java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
>   at 
> org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156)
>   at 
> org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>   at 
> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
>   at 
> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
> 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Loss was due to 
> java.io.FileNotFoundException
> java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
>   at 
> org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156)
>   at 
> org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> 

[jira] [Commented] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option

2016-08-14 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420456#comment-15420456
 ] 

Dongjoon Hyun commented on SPARK-17053:
---

Oh, I see!

> Spark ignores hive.exec.drop.ignorenonexistent=true option
> --
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option

2016-08-14 Thread Gokhan Civan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gokhan Civan updated SPARK-17053:
-
Summary: Spark ignores hive.exec.drop.ignorenonexistent=true option  (was: 
DROP statement should not require IF EXISTS)

> Spark ignores hive.exec.drop.ignorenonexistent=true option
> --
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17053:


Assignee: (was: Apache Spark)

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17053:


Assignee: Apache Spark

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>Assignee: Apache Spark
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Qifan Pu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420443#comment-15420443
 ] 

Qifan Pu commented on SPARK-17053:
--

[~dongjoon]sorry, it was a accident click.

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420430#comment-15420430
 ] 

Dongjoon Hyun edited comment on SPARK-17053 at 8/14/16 6:32 PM:


Oh, I mentioned wrong username. Hi, [~qifan]. Could you leave some comment for 
the audit when you close issue as 'WON'T FIX'?


was (Author: dongjoon):
Oh, I mentioned wrong username. Hi, [~qifan]. Could you leave some comment when 
you close issue as 'WON'T FIX' for the audit?

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420430#comment-15420430
 ] 

Dongjoon Hyun commented on SPARK-17053:
---

Oh, I mentioned wrong username. Hi, [~qifan]. Could you leave some comment when 
you close issue as 'WON'T FIX' for the audit?

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-17053:
---

Hi, [~superpanpan]. 
Could you leave some comment why you close this issue with `WON'T FIX`?

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16967) Collect Mesos support code into a module/profile

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16967:


Assignee: (was: Apache Spark)

> Collect Mesos support code into a module/profile
> 
>
> Key: SPARK-16967
> URL: https://issues.apache.org/jira/browse/SPARK-16967
> Project: Spark
>  Issue Type: Task
>  Components: Mesos, Spark Core
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Priority: Critical
>
> CC [~mgummelt] [~tnachen] [~skonto] 
> I think this is fairly easy and would be beneficial as more work goes into 
> Mesos. It should separate into a module like YARN does, just on principle 
> really, but because it also means anyone that doesn't need Mesos support can 
> build without it.
> I'm entirely willing to take a shot at this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16967) Collect Mesos support code into a module/profile

2016-08-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420425#comment-15420425
 ] 

Apache Spark commented on SPARK-16967:
--

User 'mgummelt' has created a pull request for this issue:
https://github.com/apache/spark/pull/14637

> Collect Mesos support code into a module/profile
> 
>
> Key: SPARK-16967
> URL: https://issues.apache.org/jira/browse/SPARK-16967
> Project: Spark
>  Issue Type: Task
>  Components: Mesos, Spark Core
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Priority: Critical
>
> CC [~mgummelt] [~tnachen] [~skonto] 
> I think this is fairly easy and would be beneficial as more work goes into 
> Mesos. It should separate into a module like YARN does, just on principle 
> really, but because it also means anyone that doesn't need Mesos support can 
> build without it.
> I'm entirely willing to take a shot at this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16967) Collect Mesos support code into a module/profile

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16967:


Assignee: Apache Spark

> Collect Mesos support code into a module/profile
> 
>
> Key: SPARK-16967
> URL: https://issues.apache.org/jira/browse/SPARK-16967
> Project: Spark
>  Issue Type: Task
>  Components: Mesos, Spark Core
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Assignee: Apache Spark
>Priority: Critical
>
> CC [~mgummelt] [~tnachen] [~skonto] 
> I think this is fairly easy and would be beneficial as more work goes into 
> Mesos. It should separate into a module like YARN does, just on principle 
> really, but because it also means anyone that doesn't need Mesos support can 
> build without it.
> I'm entirely willing to take a shot at this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Qifan Pu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Pu resolved SPARK-17053.
--
Resolution: Won't Fix

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17053:


Assignee: (was: Apache Spark)

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420421#comment-15420421
 ] 

Apache Spark commented on SPARK-17053:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/14636

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17053:


Assignee: Apache Spark

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>Assignee: Apache Spark
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420405#comment-15420405
 ] 

Dongjoon Hyun commented on SPARK-17053:
---

Oh, I see what you mean. Yes, indeed. Currently, Spark ignores the option.
{code}
scala> sql("set hive.exec.drop.ignorenonexistent=true")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> sql("drop table a")
org.apache.spark.sql.AnalysisException: Table to drop '`a`' does not exist;
{code}
I'll make a PR soon. Could you update the title more specifically?

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Gokhan Civan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420398#comment-15420398
 ] 

Gokhan Civan commented on SPARK-17053:
--

I saw a lot of code base that used DROP without IF EXISTS, and did find it 
strange. I guess there was an ignorenonexistent variable lurking around 
somewhere.

So this variable was somehow built into 1.6.1 and later removed?

> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420392#comment-15420392
 ] 

Sean Owen commented on SPARK-17053:
---

That would imply that "IF EXISTS" is redundant, that the behavior is always "IF 
EXISTS".

The Hive docs also disagree:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

In Hive 0.7.0 or later, DROP returns an error if the table doesn't exist, 
unless IF EXISTS is specified or the configuration variable 
hive.exec.drop.ignorenonexistent is set to true.


> DROP statement should not require IF EXISTS
> ---
>
> Key: SPARK-17053
> URL: https://issues.apache.org/jira/browse/SPARK-17053
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Gokhan Civan
>
> In version 1.6.1, the following does not throw an exception:
> create table a as select 1; drop table a; drop table a;
> In version 2.0.0, the second drop fails; this is not compatible with Hive.
> The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17053) DROP statement should not require IF EXISTS

2016-08-14 Thread Gokhan Civan (JIRA)
Gokhan Civan created SPARK-17053:


 Summary: DROP statement should not require IF EXISTS
 Key: SPARK-17053
 URL: https://issues.apache.org/jira/browse/SPARK-17053
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Gokhan Civan


In version 1.6.1, the following does not throw an exception:

create table a as select 1; drop table a; drop table a;

In version 2.0.0, the second drop fails; this is not compatible with Hive.

The same problem exists for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17052) Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17052:


Assignee: (was: Apache Spark)

> Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala
> ---
>
> Key: SPARK-17052
> URL: https://issues.apache.org/jira/browse/SPARK-17052
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Priority: Minor
>
> The original [JIRA 
> Hive-1642](https://issues.apache.org/jira/browse/HIVE-1642) delivered the 
> test cases `auto_joinXYZ` for verifying the results when the joins are 
> automatically converted to map-join. Basically, most of them are just copied 
> from the corresponding `joinXYZ`. 
> After comparison between `auto_joinXYZ` and `joinXYZ`, below is a list of 
> duplicate cases:
> {noformat}
> "auto_join0",
> "auto_join1",
> "auto_join10",
> "auto_join11",
> "auto_join12",
> "auto_join13",
> "auto_join14",
> "auto_join14_hadoop20",
> "auto_join15",
> "auto_join17",
> "auto_join18",
> "auto_join2",
> "auto_join20",
> "auto_join21",
> "auto_join23",
> "auto_join24",
> "auto_join3",
> "auto_join4",
> "auto_join5",
> "auto_join6",
> "auto_join7",
> "auto_join8",
> "auto_join9"
> {noformat}
> We can remove all of them without affecting the test coverage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17052) Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17052:


Assignee: Apache Spark

> Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala
> ---
>
> Key: SPARK-17052
> URL: https://issues.apache.org/jira/browse/SPARK-17052
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Minor
>
> The original [JIRA 
> Hive-1642](https://issues.apache.org/jira/browse/HIVE-1642) delivered the 
> test cases `auto_joinXYZ` for verifying the results when the joins are 
> automatically converted to map-join. Basically, most of them are just copied 
> from the corresponding `joinXYZ`. 
> After comparison between `auto_joinXYZ` and `joinXYZ`, below is a list of 
> duplicate cases:
> {noformat}
> "auto_join0",
> "auto_join1",
> "auto_join10",
> "auto_join11",
> "auto_join12",
> "auto_join13",
> "auto_join14",
> "auto_join14_hadoop20",
> "auto_join15",
> "auto_join17",
> "auto_join18",
> "auto_join2",
> "auto_join20",
> "auto_join21",
> "auto_join23",
> "auto_join24",
> "auto_join3",
> "auto_join4",
> "auto_join5",
> "auto_join6",
> "auto_join7",
> "auto_join8",
> "auto_join9"
> {noformat}
> We can remove all of them without affecting the test coverage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17052) Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala

2016-08-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420385#comment-15420385
 ] 

Apache Spark commented on SPARK-17052:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/14635

> Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala
> ---
>
> Key: SPARK-17052
> URL: https://issues.apache.org/jira/browse/SPARK-17052
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Priority: Minor
>
> The original [JIRA 
> Hive-1642](https://issues.apache.org/jira/browse/HIVE-1642) delivered the 
> test cases `auto_joinXYZ` for verifying the results when the joins are 
> automatically converted to map-join. Basically, most of them are just copied 
> from the corresponding `joinXYZ`. 
> After comparison between `auto_joinXYZ` and `joinXYZ`, below is a list of 
> duplicate cases:
> {noformat}
> "auto_join0",
> "auto_join1",
> "auto_join10",
> "auto_join11",
> "auto_join12",
> "auto_join13",
> "auto_join14",
> "auto_join14_hadoop20",
> "auto_join15",
> "auto_join17",
> "auto_join18",
> "auto_join2",
> "auto_join20",
> "auto_join21",
> "auto_join23",
> "auto_join24",
> "auto_join3",
> "auto_join4",
> "auto_join5",
> "auto_join6",
> "auto_join7",
> "auto_join8",
> "auto_join9"
> {noformat}
> We can remove all of them without affecting the test coverage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17052) Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala

2016-08-14 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-17052:

Priority: Minor  (was: Major)

> Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala
> ---
>
> Key: SPARK-17052
> URL: https://issues.apache.org/jira/browse/SPARK-17052
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Priority: Minor
>
> The original [JIRA 
> Hive-1642](https://issues.apache.org/jira/browse/HIVE-1642) delivered the 
> test cases `auto_joinXYZ` for verifying the results when the joins are 
> automatically converted to map-join. Basically, most of them are just copied 
> from the corresponding `joinXYZ`. 
> After comparison between `auto_joinXYZ` and `joinXYZ`, below is a list of 
> duplicate cases:
> {noformat}
> "auto_join0",
> "auto_join1",
> "auto_join10",
> "auto_join11",
> "auto_join12",
> "auto_join13",
> "auto_join14",
> "auto_join14_hadoop20",
> "auto_join15",
> "auto_join17",
> "auto_join18",
> "auto_join2",
> "auto_join20",
> "auto_join21",
> "auto_join23",
> "auto_join24",
> "auto_join3",
> "auto_join4",
> "auto_join5",
> "auto_join6",
> "auto_join7",
> "auto_join8",
> "auto_join9"
> {noformat}
> We can remove all of them without affecting the test coverage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17052) Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala

2016-08-14 Thread Xiao Li (JIRA)
Xiao Li created SPARK-17052:
---

 Summary: Remove Duplicate Test Cases auto_join from 
HiveCompatibilitySuite.scala
 Key: SPARK-17052
 URL: https://issues.apache.org/jira/browse/SPARK-17052
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


The original [JIRA Hive-1642](https://issues.apache.org/jira/browse/HIVE-1642) 
delivered the test cases `auto_joinXYZ` for verifying the results when the 
joins are automatically converted to map-join. Basically, most of them are just 
copied from the corresponding `joinXYZ`. 

After comparison between `auto_joinXYZ` and `joinXYZ`, below is a list of 
duplicate cases:
{noformat}
"auto_join0",
"auto_join1",
"auto_join10",
"auto_join11",
"auto_join12",
"auto_join13",
"auto_join14",
"auto_join14_hadoop20",
"auto_join15",
"auto_join17",
"auto_join18",
"auto_join2",
"auto_join20",
"auto_join21",
"auto_join23",
"auto_join24",
"auto_join3",
"auto_join4",
"auto_join5",
"auto_join6",
"auto_join7",
"auto_join8",
"auto_join9"
{noformat}

We can remove all of them without affecting the test coverage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17051) we should use hadoopConf in InsertIntoHiveTable

2016-08-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420350#comment-15420350
 ] 

Apache Spark commented on SPARK-17051:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/14634

> we should use hadoopConf in InsertIntoHiveTable
> ---
>
> Key: SPARK-17051
> URL: https://issues.apache.org/jira/browse/SPARK-17051
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17051) we should use hadoopConf in InsertIntoHiveTable

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17051:


Assignee: Wenchen Fan  (was: Apache Spark)

> we should use hadoopConf in InsertIntoHiveTable
> ---
>
> Key: SPARK-17051
> URL: https://issues.apache.org/jira/browse/SPARK-17051
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17051) we should use hadoopConf in InsertIntoHiveTable

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17051:


Assignee: Apache Spark  (was: Wenchen Fan)

> we should use hadoopConf in InsertIntoHiveTable
> ---
>
> Key: SPARK-17051
> URL: https://issues.apache.org/jira/browse/SPARK-17051
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17051) we should use hadoopConf in InsertIntoHiveTable

2016-08-14 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-17051:
---

 Summary: we should use hadoopConf in InsertIntoHiveTable
 Key: SPARK-17051
 URL: https://issues.apache.org/jira/browse/SPARK-17051
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17050) Improve initKMeansParallel with treeAggregate

2016-08-14 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420336#comment-15420336
 ] 

Sean Owen commented on SPARK-17050:
---

[~WeichenXu123] this sounds quite related to 
https://issues.apache.org/jira/browse/SPARK-17033 . I don't think we should 
open JIRAs for the exact same logical change in different classes. Let's put 
these together into SPARK-17033.

> Improve initKMeansParallel with treeAggregate
> -
>
> Key: SPARK-17050
> URL: https://issues.apache.org/jira/browse/SPARK-17050
> Project: Spark
>  Issue Type: Improvement
>Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The `initKMeansParallel` use `rdd.aggregate`, it is better to use 
> `treeAggregate` to get better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17050) Improve initKMeansParallel with treeAggregate

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17050:


Assignee: Apache Spark

> Improve initKMeansParallel with treeAggregate
> -
>
> Key: SPARK-17050
> URL: https://issues.apache.org/jira/browse/SPARK-17050
> Project: Spark
>  Issue Type: Improvement
>Reporter: Weichen Xu
>Assignee: Apache Spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The `initKMeansParallel` use `rdd.aggregate`, it is better to use 
> `treeAggregate` to get better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17050) Improve initKMeansParallel with treeAggregate

2016-08-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17050:


Assignee: (was: Apache Spark)

> Improve initKMeansParallel with treeAggregate
> -
>
> Key: SPARK-17050
> URL: https://issues.apache.org/jira/browse/SPARK-17050
> Project: Spark
>  Issue Type: Improvement
>Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The `initKMeansParallel` use `rdd.aggregate`, it is better to use 
> `treeAggregate` to get better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17050) Improve initKMeansParallel with treeAggregate

2016-08-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420325#comment-15420325
 ] 

Apache Spark commented on SPARK-17050:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/14628

> Improve initKMeansParallel with treeAggregate
> -
>
> Key: SPARK-17050
> URL: https://issues.apache.org/jira/browse/SPARK-17050
> Project: Spark
>  Issue Type: Improvement
>Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The `initKMeansParallel` use `rdd.aggregate`, it is better to use 
> `treeAggregate` to get better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17050) Improve initKMeansParallel with treeAggregate

2016-08-14 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17050:
--

 Summary: Improve initKMeansParallel with treeAggregate
 Key: SPARK-17050
 URL: https://issues.apache.org/jira/browse/SPARK-17050
 Project: Spark
  Issue Type: Improvement
Reporter: Weichen Xu


The `initKMeansParallel` use `rdd.aggregate`, it is better to use 
`treeAggregate` to get better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-6273) Got error when one table's alias name is the same with other table's column name

2016-08-14 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-6273:
--

> Got error when one table's alias name is the same with other table's column 
> name
> 
>
> Key: SPARK-6273
> URL: https://issues.apache.org/jira/browse/SPARK-6273
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 1.3.1
>Reporter: Jeff
>
> while one table's alias name is the same with other table's column name
> get the error Ambiguous references
> {code}
> Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Ambiguous references to salary.pay_date: 
> (pay_date#34749,List()),(salary#34792,List(pay_date)), tree:
> 'Filter 'salary.pay_date = 'time_by_day.the_date) && 
> ('time_by_day.the_year = 1997.0)) && ('salary.employee_id = 
> 'employee.employee_id)) && ('employee.store_id = 'store.store_id))
>  Join Inner, None
>   Join Inner, None
>Join Inner, None
> MetastoreRelation yxqtest, time_by_day, Some(time_by_day)
> MetastoreRelation yxqtest, salary, Some(salary)
>MetastoreRelation yxqtest, store, Some(store)
>   MetastoreRelation yxqtest, employee, Some(employee) (state=,code=0)
> Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Ambiguous references to salary.pay_date: 
> (pay_date#34749,List()),(salary#34792,List(pay_date)), tree:
> 'Filter 'salary.pay_date = 'time_by_day.the_date) && 
> ('time_by_day.the_year = 1997.0)) && ('salary.employee_id = 
> 'employee.employee_id)) && ('employee.store_id = 'store.store_id))
>  Join Inner, None
>   Join Inner, None
>Join Inner, None
> MetastoreRelation yxqtest, time_by_day, Some(time_by_day)
> MetastoreRelation yxqtest, salary, Some(salary)
>MetastoreRelation yxqtest, store, Some(store)
>   MetastoreRelation yxqtest, employee, Some(employee) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6273) Got error when one table's alias name is the same with other table's column name

2016-08-14 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6273.
--
   Resolution: Duplicate
Fix Version/s: (was: 2.0.0)

> Got error when one table's alias name is the same with other table's column 
> name
> 
>
> Key: SPARK-6273
> URL: https://issues.apache.org/jira/browse/SPARK-6273
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 1.3.1
>Reporter: Jeff
>
> while one table's alias name is the same with other table's column name
> get the error Ambiguous references
> {code}
> Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Ambiguous references to salary.pay_date: 
> (pay_date#34749,List()),(salary#34792,List(pay_date)), tree:
> 'Filter 'salary.pay_date = 'time_by_day.the_date) && 
> ('time_by_day.the_year = 1997.0)) && ('salary.employee_id = 
> 'employee.employee_id)) && ('employee.store_id = 'store.store_id))
>  Join Inner, None
>   Join Inner, None
>Join Inner, None
> MetastoreRelation yxqtest, time_by_day, Some(time_by_day)
> MetastoreRelation yxqtest, salary, Some(salary)
>MetastoreRelation yxqtest, store, Some(store)
>   MetastoreRelation yxqtest, employee, Some(employee) (state=,code=0)
> Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Ambiguous references to salary.pay_date: 
> (pay_date#34749,List()),(salary#34792,List(pay_date)), tree:
> 'Filter 'salary.pay_date = 'time_by_day.the_date) && 
> ('time_by_day.the_year = 1997.0)) && ('salary.employee_id = 
> 'employee.employee_id)) && ('employee.store_id = 'store.store_id))
>  Join Inner, None
>   Join Inner, None
>Join Inner, None
> MetastoreRelation yxqtest, time_by_day, Some(time_by_day)
> MetastoreRelation yxqtest, salary, Some(salary)
>MetastoreRelation yxqtest, store, Some(store)
>   MetastoreRelation yxqtest, employee, Some(employee) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17027) PolynomialExpansion.choose is prone to integer overflow

2016-08-14 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-17027:
--
Fix Version/s: (was: 1.6.3)

> PolynomialExpansion.choose is prone to integer overflow 
> 
>
> Key: SPARK-17027
> URL: https://issues.apache.org/jira/browse/SPARK-17027
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> Current implementation computes power of k directly and because of that it is 
> susceptible to integer overflow on relatively small input (4 features, degree 
> equal 10).  It would be better to use recursive formula instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16843) Select features according to a percentile of the highest scores of ChiSqSelector

2016-08-14 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16843.
---
Resolution: Duplicate

I'd like to call this a subset of the more general changes for chi-squared 
selection proposed in SPARK-17017 (see PR for more detail: 
https://github.com/apache/spark/pull/14597 )

> Select features according to a percentile of the highest scores of 
> ChiSqSelector
> 
>
> Key: SPARK-16843
> URL: https://issues.apache.org/jira/browse/SPARK-16843
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Peng Meng
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It would be handy to add a percentile Param to ChiSqSelector, as in the 
> scikit-learn one: 
> http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectPercentile.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16885) Spark shell failed to run in yarn-client mode

2016-08-14 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16885.
---
Resolution: Not A Problem

> Spark shell failed to run in yarn-client mode
> -
>
> Key: SPARK-16885
> URL: https://issues.apache.org/jira/browse/SPARK-16885
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.0.0
> Environment: Ubuntu 12.04
> Hadoop 2.7.2 + Yarn
>Reporter: Yury Zhyshko
> Attachments: spark-env.sh
>
>
> I've installed Hadoop + Yarn in pseudo distributed mode following these 
> instructions: 
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_a_Single_Node
> After that I downloaded and installed a prebuild Spark for Hadoop 2.7
> The command that I used to run a shell: 
> ./bin/spark-shell --master yarn --deploy-mode client --conf 
> spark.yarn.archive=/home/yzhishko/work/spark/jars
> Here is the error:
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/08/03 17:13:50 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/08/03 17:13:52 ERROR spark.SparkContext: Error initializing SparkContext.
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
>   at org.apache.hadoop.fs.Path.(Path.java:134)
>   at org.apache.hadoop.fs.Path.(Path.java:93)
>   at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:338)
>   at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:433)
>   at 
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:472)
>   at 
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
>   at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
>   at org.apache.spark.SparkContext.(SparkContext.scala:500)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
>   at org.apache.spark.repl.Main$.createSparkSession(Main.scala:101)
>   at $line3.$read$$iw$$iw.(:15)
>   at $line3.$read$$iw.(:31)
>   at $line3.$read.(:33)
>   at $line3.$read$.(:37)
>   at $line3.$read$.()
>   at $line3.$eval$.$print$lzycompute(:7)
>   at $line3.$eval$.$print(:6)
>   at $line3.$eval.$print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
>   at 
> scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
>   at 
> scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
>   at 
> scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
>   at 
> scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
>   at 
> scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
>   at 
> scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
>   at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
>   at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
>   at 
> scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
>   at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
>   at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
>   at 
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
>   at 

[jira] [Updated] (SPARK-17027) PolynomialExpansion.choose is prone to integer overflow

2016-08-14 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-17027:
--
Assignee: Maciej Szymkiewicz

> PolynomialExpansion.choose is prone to integer overflow 
> 
>
> Key: SPARK-17027
> URL: https://issues.apache.org/jira/browse/SPARK-17027
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 1.6.3, 2.0.1, 2.1.0
>
>
> Current implementation computes power of k directly and because of that it is 
> susceptible to integer overflow on relatively small input (4 features, degree 
> equal 10).  It would be better to use recursive formula instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17027) PolynomialExpansion.choose is prone to integer overflow

2016-08-14 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-17027.
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1
   1.6.3

Issue resolved by pull request 14614
[https://github.com/apache/spark/pull/14614]

> PolynomialExpansion.choose is prone to integer overflow 
> 
>
> Key: SPARK-17027
> URL: https://issues.apache.org/jira/browse/SPARK-17027
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 1.6.3, 2.0.1, 2.1.0
>
>
> Current implementation computes power of k directly and because of that it is 
> susceptible to integer overflow on relatively small input (4 features, degree 
> equal 10).  It would be better to use recursive formula instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org