[jira] [Commented] (SPARK-17057) ProbabilisticClassifierModels' prediction more reasonable with multi zero thresholds
[ https://issues.apache.org/jira/browse/SPARK-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420632#comment-15420632 ] Apache Spark commented on SPARK-17057: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/14643 > ProbabilisticClassifierModels' prediction more reasonable with multi zero > thresholds > > > Key: SPARK-17057 > URL: https://issues.apache.org/jira/browse/SPARK-17057 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: zhengruifeng > > {code} > val path = "./data/mllib/sample_multiclass_classification_data.txt" > val data = spark.read.format("libsvm").load(path) > val rfm = rf.fit(data) > scala> rfm.setThresholds(Array(0.0,0.0,0.0)) > res4: org.apache.spark.ml.classification.RandomForestClassificationModel = > RandomForestClassificationModel (uid=rfc_cbe640b0eccc) with 20 trees > scala> rfm.transform(data).show(5) > +-++--+-+--+ > |label|features| rawPrediction| probability|prediction| > +-++--+-+--+ > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 0.0|(4,[0,1,2,3],[0.1...|[20.0,0.0,0.0]|[1.0,0.0,0.0]| 0.0| > +-++--+-+--+ > only showing top 5 rows > {code} > If multi thresholds are set zero, the prediction of > {{ProbabilisticClassificationModel}} is the first index whose corresponding > threshold is 0. > However, in this case, the index with max {{probability}} among indices with > 0-threshold should be more reasonable to mark as > {{prediction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17057) ProbabilisticClassifierModels' prediction more reasonable with multi zero thresholds
[ https://issues.apache.org/jira/browse/SPARK-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17057: Assignee: (was: Apache Spark) > ProbabilisticClassifierModels' prediction more reasonable with multi zero > thresholds > > > Key: SPARK-17057 > URL: https://issues.apache.org/jira/browse/SPARK-17057 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: zhengruifeng > > {code} > val path = "./data/mllib/sample_multiclass_classification_data.txt" > val data = spark.read.format("libsvm").load(path) > val rfm = rf.fit(data) > scala> rfm.setThresholds(Array(0.0,0.0,0.0)) > res4: org.apache.spark.ml.classification.RandomForestClassificationModel = > RandomForestClassificationModel (uid=rfc_cbe640b0eccc) with 20 trees > scala> rfm.transform(data).show(5) > +-++--+-+--+ > |label|features| rawPrediction| probability|prediction| > +-++--+-+--+ > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 0.0|(4,[0,1,2,3],[0.1...|[20.0,0.0,0.0]|[1.0,0.0,0.0]| 0.0| > +-++--+-+--+ > only showing top 5 rows > {code} > If multi thresholds are set zero, the prediction of > {{ProbabilisticClassificationModel}} is the first index whose corresponding > threshold is 0. > However, in this case, the index with max {{probability}} among indices with > 0-threshold should be more reasonable to mark as > {{prediction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17057) ProbabilisticClassifierModels' prediction more reasonable with multi zero thresholds
[ https://issues.apache.org/jira/browse/SPARK-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17057: Assignee: Apache Spark > ProbabilisticClassifierModels' prediction more reasonable with multi zero > thresholds > > > Key: SPARK-17057 > URL: https://issues.apache.org/jira/browse/SPARK-17057 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: zhengruifeng >Assignee: Apache Spark > > {code} > val path = "./data/mllib/sample_multiclass_classification_data.txt" > val data = spark.read.format("libsvm").load(path) > val rfm = rf.fit(data) > scala> rfm.setThresholds(Array(0.0,0.0,0.0)) > res4: org.apache.spark.ml.classification.RandomForestClassificationModel = > RandomForestClassificationModel (uid=rfc_cbe640b0eccc) with 20 trees > scala> rfm.transform(data).show(5) > +-++--+-+--+ > |label|features| rawPrediction| probability|prediction| > +-++--+-+--+ > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 0.0|(4,[0,1,2,3],[0.1...|[20.0,0.0,0.0]|[1.0,0.0,0.0]| 0.0| > +-++--+-+--+ > only showing top 5 rows > {code} > If multi thresholds are set zero, the prediction of > {{ProbabilisticClassificationModel}} is the first index whose corresponding > threshold is 0. > However, in this case, the index with max {{probability}} among indices with > 0-threshold should be more reasonable to mark as > {{prediction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17057) ProbabilisticClassifierModels' prediction more reasonable with multi zero thresholds
zhengruifeng created SPARK-17057: Summary: ProbabilisticClassifierModels' prediction more reasonable with multi zero thresholds Key: SPARK-17057 URL: https://issues.apache.org/jira/browse/SPARK-17057 Project: Spark Issue Type: Improvement Components: ML Reporter: zhengruifeng {code} val path = "./data/mllib/sample_multiclass_classification_data.txt" val data = spark.read.format("libsvm").load(path) val rfm = rf.fit(data) scala> rfm.setThresholds(Array(0.0,0.0,0.0)) res4: org.apache.spark.ml.classification.RandomForestClassificationModel = RandomForestClassificationModel (uid=rfc_cbe640b0eccc) with 20 trees scala> rfm.transform(data).show(5) +-++--+-+--+ |label|features| rawPrediction| probability|prediction| +-++--+-+--+ | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| | 1.0|(4,[0,1,2,3],[-0|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| | 0.0|(4,[0,1,2,3],[0.1...|[20.0,0.0,0.0]|[1.0,0.0,0.0]| 0.0| +-++--+-+--+ only showing top 5 rows {code} If multi thresholds are set zero, the prediction of {{ProbabilisticClassificationModel}} is the first index whose corresponding threshold is 0. However, in this case, the index with max {{probability}} among indices with 0-threshold should be more reasonable to mark as {{prediction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17056) Fix a wrong assert in MemoryStore
[ https://issues.apache.org/jira/browse/SPARK-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17056: Assignee: (was: Apache Spark) > Fix a wrong assert in MemoryStore > - > > Key: SPARK-17056 > URL: https://issues.apache.org/jira/browse/SPARK-17056 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Liang-Chi Hsieh >Priority: Minor > > There is an assert in MemoryStore's putIteratorAsValues method which is used > to check if unroll memory is not released too much. This assert looks wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17056) Fix a wrong assert in MemoryStore
[ https://issues.apache.org/jira/browse/SPARK-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420627#comment-15420627 ] Apache Spark commented on SPARK-17056: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/14642 > Fix a wrong assert in MemoryStore > - > > Key: SPARK-17056 > URL: https://issues.apache.org/jira/browse/SPARK-17056 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Liang-Chi Hsieh >Priority: Minor > > There is an assert in MemoryStore's putIteratorAsValues method which is used > to check if unroll memory is not released too much. This assert looks wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17056) Fix a wrong assert in MemoryStore
[ https://issues.apache.org/jira/browse/SPARK-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17056: Assignee: Apache Spark > Fix a wrong assert in MemoryStore > - > > Key: SPARK-17056 > URL: https://issues.apache.org/jira/browse/SPARK-17056 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark >Priority: Minor > > There is an assert in MemoryStore's putIteratorAsValues method which is used > to check if unroll memory is not released too much. This assert looks wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17056) Fix a wrong assert in MemoryStore
Liang-Chi Hsieh created SPARK-17056: --- Summary: Fix a wrong assert in MemoryStore Key: SPARK-17056 URL: https://issues.apache.org/jira/browse/SPARK-17056 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Liang-Chi Hsieh Priority: Minor There is an assert in MemoryStore's putIteratorAsValues method which is used to check if unroll memory is not released too much. This assert looks wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17055) add labelKFold to CrossValidator
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-17055: Description: Current CrossValidator only supports k-fold, which randomly divides all the samples in k groups of samples. But in cases when data is gathered from different subjects and we want to avoid over-fitting, we want to hold out samples with certain labels from training data and put them into validation fold, i.e. we want to ensure that the same label is not in both testing and training sets. Mainstream packages like Sklearn already supports such cross validation method. (http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LabelKFold.html#sklearn.cross_validation.LabelKFold) was: Current CrossValidator only supports k-fold, which randomly divides all the samples in k groups of samples. But in cases when data is gathered from different subjects and we want to avoid over-fitting, we want to hold out samples with certain labels from training data and put them into validation fold, i.e. we want to ensure that the same label is not in both testing and training sets. Mainstream package like Sklearn already supports such cross validation method. > add labelKFold to CrossValidator > > > Key: SPARK-17055 > URL: https://issues.apache.org/jira/browse/SPARK-17055 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Vincent >Priority: Minor > > Current CrossValidator only supports k-fold, which randomly divides all the > samples in k groups of samples. But in cases when data is gathered from > different subjects and we want to avoid over-fitting, we want to hold out > samples with certain labels from training data and put them into validation > fold, i.e. we want to ensure that the same label is not in both testing and > training sets. > Mainstream packages like Sklearn already supports such cross validation > method. > (http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LabelKFold.html#sklearn.cross_validation.LabelKFold) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17055) add labelKFold to CrossValidator
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-17055: Affects Version/s: (was: 2.0.0) > add labelKFold to CrossValidator > > > Key: SPARK-17055 > URL: https://issues.apache.org/jira/browse/SPARK-17055 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Vincent >Priority: Minor > > Current CrossValidator only supports k-fold, which randomly divides all the > samples in k groups of samples. But in cases when data is gathered from > different subjects and we want to avoid over-fitting, we want to hold out > samples with certain labels from training data and put them into validation > fold, i.e. we want to ensure that the same label is not in both testing and > training sets. > Mainstream package like Sklearn already supports such cross validation > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17055) add labelKFold to CrossValidator
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17055: Assignee: Apache Spark > add labelKFold to CrossValidator > > > Key: SPARK-17055 > URL: https://issues.apache.org/jira/browse/SPARK-17055 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 2.0.0 >Reporter: Vincent >Assignee: Apache Spark >Priority: Minor > > Current CrossValidator only supports k-fold, which randomly divides all the > samples in k groups of samples. But in cases when data is gathered from > different subjects and we want to avoid over-fitting, we want to hold out > samples with certain labels from training data and put them into validation > fold, i.e. we want to ensure that the same label is not in both testing and > training sets. > Mainstream package like Sklearn already supports such cross validation > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17055) add labelKFold to CrossValidator
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17055: Assignee: (was: Apache Spark) > add labelKFold to CrossValidator > > > Key: SPARK-17055 > URL: https://issues.apache.org/jira/browse/SPARK-17055 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 2.0.0 >Reporter: Vincent >Priority: Minor > > Current CrossValidator only supports k-fold, which randomly divides all the > samples in k groups of samples. But in cases when data is gathered from > different subjects and we want to avoid over-fitting, we want to hold out > samples with certain labels from training data and put them into validation > fold, i.e. we want to ensure that the same label is not in both testing and > training sets. > Mainstream package like Sklearn already supports such cross validation > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17055) add labelKFold to CrossValidator
[ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420569#comment-15420569 ] Apache Spark commented on SPARK-17055: -- User 'VinceShieh' has created a pull request for this issue: https://github.com/apache/spark/pull/14640 > add labelKFold to CrossValidator > > > Key: SPARK-17055 > URL: https://issues.apache.org/jira/browse/SPARK-17055 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 2.0.0 >Reporter: Vincent >Priority: Minor > > Current CrossValidator only supports k-fold, which randomly divides all the > samples in k groups of samples. But in cases when data is gathered from > different subjects and we want to avoid over-fitting, we want to hold out > samples with certain labels from training data and put them into validation > fold, i.e. we want to ensure that the same label is not in both testing and > training sets. > Mainstream package like Sklearn already supports such cross validation > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17055) add labelKFold to CrossValidator
Vincent created SPARK-17055: --- Summary: add labelKFold to CrossValidator Key: SPARK-17055 URL: https://issues.apache.org/jira/browse/SPARK-17055 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 2.0.0 Reporter: Vincent Priority: Minor Current CrossValidator only supports k-fold, which randomly divides all the samples in k groups of samples. But in cases when data is gathered from different subjects and we want to avoid over-fitting, we want to hold out samples with certain labels from training data and put them into validation fold, i.e. we want to ensure that the same label is not in both testing and training sets. Mainstream package like Sklearn already supports such cross validation method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420554#comment-15420554 ] Guoqiang Li edited comment on SPARK-6235 at 8/15/16 1:53 AM: - [~hvanhovell] The main changes. 1. Replace DiskStore method {{def getBytes (blockId: BlockId): ChunkedByteBuffer}} to {{def getBlockData(blockId: BlockId): ManagedBuffer}}. 2. ManagedBuffer's nioByteBuffer method return ChunkedByteBuffer. 3. Add Class {{ChunkFetchInputStream}}, used for flow control and code as follows: {noformat} package org.apache.spark.network.client; import java.io.IOException; import java.io.InputStream; import java.nio.channels.ClosedChannelException; import java.util.Iterator; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicReference; import com.google.common.primitives.UnsignedBytes; import io.netty.buffer.ByteBuf; import io.netty.channel.Channel; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.apache.spark.network.buffer.ChunkedByteBuffer; import org.apache.spark.network.buffer.ManagedBuffer; import org.apache.spark.network.protocol.StreamChunkId; import org.apache.spark.network.util.LimitedInputStream; import org.apache.spark.network.util.TransportFrameDecoder; public class ChunkFetchInputStream extends InputStream { private final Logger logger = LoggerFactory.getLogger(ChunkFetchInputStream.class); private final TransportResponseHandler handler; private final Channel channel; private final StreamChunkId streamId; private final long byteCount; private final ChunkReceivedCallback callback; private final LinkedBlockingQueue buffers = new LinkedBlockingQueue<>(1024); public final TransportFrameDecoder.Interceptor interceptor; private ByteBuf curChunk; private boolean isCallbacked = false; private long writerIndex = 0; private final AtomicReference cause = new AtomicReference<>(null); private final AtomicBoolean isClosed = new AtomicBoolean(false); public ChunkFetchInputStream( TransportResponseHandler handler, Channel channel, StreamChunkId streamId, long byteCount, ChunkReceivedCallback callback) { this.handler = handler; this.channel = channel; this.streamId = streamId; this.byteCount = byteCount; this.callback = callback; this.interceptor = new StreamInterceptor(); } @Override public int read() throws IOException { if (isClosed.get()) return -1; pullChunk(); if (curChunk != null) { byte b = curChunk.readByte(); return UnsignedBytes.toInt(b); } else { return -1; } } @Override public int read(byte[] dest, int offset, int length) throws IOException { if (isClosed.get()) return -1; pullChunk(); if (curChunk != null) { int amountToGet = Math.min(curChunk.readableBytes(), length); curChunk.readBytes(dest, offset, amountToGet); return amountToGet; } else { return -1; } } @Override public long skip(long bytes) throws IOException { if (isClosed.get()) return 0L; pullChunk(); if (curChunk != null) { int amountToSkip = (int) Math.min(bytes, curChunk.readableBytes()); curChunk.skipBytes(amountToSkip); return amountToSkip; } else { return 0L; } } @Override public void close() throws IOException { if (!isClosed.get()) { releaseCurChunk(); isClosed.set(true); resetChannel(); Iterator itr = buffers.iterator(); while (itr.hasNext()) { itr.next().release(); } buffers.clear(); } } private void pullChunk() throws IOException { if (curChunk != null && !curChunk.isReadable()) releaseCurChunk(); if (curChunk == null && cause.get() == null && !isClosed.get()) { try { curChunk = buffers.take(); // if channel.read() will be not invoked automatically, // the method is called by here if (!channel.config().isAutoRead()) channel.read(); } catch (Throwable e) { setCause(e); } } if (cause.get() != null) throw new IOException(cause.get()); } private void setCause(Throwable e) { if (cause.get() == null) cause.set(e); } private void releaseCurChunk() { if (curChunk != null) { curChunk.release(); curChunk = null; } } private void onSuccess() throws IOException { if (isCallbacked) return; if (cause.get() != null) { callback.onFailure(streamId.chunkIndex, cause.get()); } else { InputStream inputStream = new LimitedInputStream(this, byteCount); ManagedBuffer managedBuffer = new InputStreamManagedBuffer(inputStream, byteCount); callback.onSuccess(streamId.chunkIndex, managedBuffer); } isCallbacked = true; } private void resetChannel() { if
[jira] [Commented] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420554#comment-15420554 ] Guoqiang Li commented on SPARK-6235: [~hvanhovell] The main changes. 1. Replace DiskStore method {{def getBytes (blockId: BlockId): ChunkedByteBuffer}} to {{def getBlockData(blockId: BlockId): ManagedBuffer}}. 2. ManagedBuffer's nioByteBuffer method return ChunkedByteBuffer. 3. Add Class Chunk Fetch InputStream, used for flow control and code as follows: {noformat} package org.apache.spark.network.client; import java.io.IOException; import java.io.InputStream; import java.nio.channels.ClosedChannelException; import java.util.Iterator; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicReference; import com.google.common.primitives.UnsignedBytes; import io.netty.buffer.ByteBuf; import io.netty.channel.Channel; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.apache.spark.network.buffer.ChunkedByteBuffer; import org.apache.spark.network.buffer.ManagedBuffer; import org.apache.spark.network.protocol.StreamChunkId; import org.apache.spark.network.util.LimitedInputStream; import org.apache.spark.network.util.TransportFrameDecoder; public class ChunkFetchInputStream extends InputStream { private final Logger logger = LoggerFactory.getLogger(ChunkFetchInputStream.class); private final TransportResponseHandler handler; private final Channel channel; private final StreamChunkId streamId; private final long byteCount; private final ChunkReceivedCallback callback; private final LinkedBlockingQueue buffers = new LinkedBlockingQueue<>(1024); public final TransportFrameDecoder.Interceptor interceptor; private ByteBuf curChunk; private boolean isCallbacked = false; private long writerIndex = 0; private final AtomicReference cause = new AtomicReference<>(null); private final AtomicBoolean isClosed = new AtomicBoolean(false); public ChunkFetchInputStream( TransportResponseHandler handler, Channel channel, StreamChunkId streamId, long byteCount, ChunkReceivedCallback callback) { this.handler = handler; this.channel = channel; this.streamId = streamId; this.byteCount = byteCount; this.callback = callback; this.interceptor = new StreamInterceptor(); } @Override public int read() throws IOException { if (isClosed.get()) return -1; pullChunk(); if (curChunk != null) { byte b = curChunk.readByte(); return UnsignedBytes.toInt(b); } else { return -1; } } @Override public int read(byte[] dest, int offset, int length) throws IOException { if (isClosed.get()) return -1; pullChunk(); if (curChunk != null) { int amountToGet = Math.min(curChunk.readableBytes(), length); curChunk.readBytes(dest, offset, amountToGet); return amountToGet; } else { return -1; } } @Override public long skip(long bytes) throws IOException { if (isClosed.get()) return 0L; pullChunk(); if (curChunk != null) { int amountToSkip = (int) Math.min(bytes, curChunk.readableBytes()); curChunk.skipBytes(amountToSkip); return amountToSkip; } else { return 0L; } } @Override public void close() throws IOException { if (!isClosed.get()) { releaseCurChunk(); isClosed.set(true); resetChannel(); Iterator itr = buffers.iterator(); while (itr.hasNext()) { itr.next().release(); } buffers.clear(); } } private void pullChunk() throws IOException { if (curChunk != null && !curChunk.isReadable()) releaseCurChunk(); if (curChunk == null && cause.get() == null && !isClosed.get()) { try { curChunk = buffers.take(); // if channel.read() will be not invoked automatically, // the method is called by here if (!channel.config().isAutoRead()) channel.read(); } catch (Throwable e) { setCause(e); } } if (cause.get() != null) throw new IOException(cause.get()); } private void setCause(Throwable e) { if (cause.get() == null) cause.set(e); } private void releaseCurChunk() { if (curChunk != null) { curChunk.release(); curChunk = null; } } private void onSuccess() throws IOException { if (isCallbacked) return; if (cause.get() != null) { callback.onFailure(streamId.chunkIndex, cause.get()); } else { InputStream inputStream = new LimitedInputStream(this, byteCount); ManagedBuffer managedBuffer = new InputStreamManagedBuffer(inputStream, byteCount); callback.onSuccess(streamId.chunkIndex, managedBuffer); } isCallbacked = true; } private void resetChannel() { if (!channel.config().isAutoRead()) {
[jira] [Commented] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os
[ https://issues.apache.org/jira/browse/SPARK-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420553#comment-15420553 ] Jeff Zhang commented on SPARK-17054: Although I can fix it by using the correct cache dir for mac OS, I am confused why we need to download sparkR. I don't remember it is needed in spark 1.x. Is this expected behavior ? [~shivaram] [~junyangq] > SparkR can not run in yarn-cluster mode on mac os > - > > Key: SPARK-17054 > URL: https://issues.apache.org/jira/browse/SPARK-17054 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Jeff Zhang > > This is due to it download sparkR to the wrong place. > {noformat} > Warning message: > 'sparkR.init' is deprecated. > Use 'sparkR.session' instead. > See help("Deprecated") > Spark not found in SPARK_HOME: . > To search in the cache directory. Installation will start if not found. > Mirror site not provided. > Looking for site suggested from apache website... > Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark > Downloading Spark spark-2.0.0 for Hadoop 2.7 from: > - > http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz > Fetch failed from http://apache.mirror.cdnetworks.com/spark > open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', > reason 'No such file or directory'> > To use backup site... > Downloading Spark spark-2.0.0 for Hadoop 2.7 from: > - > http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz > Fetch failed from http://www-us.apache.org/dist/spark > open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', > reason 'No such file or directory'> > Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName, > : > Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network > connection, Hadoop version, or provide other mirror sites. > Calls: sparkRSQL.init ... sparkR.session -> install.spark -> > robust_download_tar > In addition: Warning messages: > 1: 'sparkRSQL.init' is deprecated. > Use 'sparkR.session' instead. > See help("Deprecated") > 2: In dir.create(localDir, recursive = TRUE) : > cannot create dir '/home//Library', reason 'Operation not supported' > Execution halted > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os
Jeff Zhang created SPARK-17054: -- Summary: SparkR can not run in yarn-cluster mode on mac os Key: SPARK-17054 URL: https://issues.apache.org/jira/browse/SPARK-17054 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 2.0.0 Reporter: Jeff Zhang This is due to it download sparkR to the wrong place. {noformat} Warning message: 'sparkR.init' is deprecated. Use 'sparkR.session' instead. See help("Deprecated") Spark not found in SPARK_HOME: . To search in the cache directory. Installation will start if not found. Mirror site not provided. Looking for site suggested from apache website... Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark Downloading Spark spark-2.0.0 for Hadoop 2.7 from: - http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz Fetch failed from http://apache.mirror.cdnetworks.com/spark To use backup site... Downloading Spark spark-2.0.0 for Hadoop 2.7 from: - http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz Fetch failed from http://www-us.apache.org/dist/spark Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName, : Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network connection, Hadoop version, or provide other mirror sites. Calls: sparkRSQL.init ... sparkR.session -> install.spark -> robust_download_tar In addition: Warning messages: 1: 'sparkRSQL.init' is deprecated. Use 'sparkR.session' instead. See help("Deprecated") 2: In dir.create(localDir, recursive = TRUE) : cannot create dir '/home//Library', reason 'Operation not supported' Execution halted {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16781) java launched by PySpark as gateway may not be the same java used in the spark environment
[ https://issues.apache.org/jira/browse/SPARK-16781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420547#comment-15420547 ] Jeff Zhang commented on SPARK-16781: JAVA_HOME will be set by yarn, not sure about other cluster managers. > java launched by PySpark as gateway may not be the same java used in the > spark environment > -- > > Key: SPARK-16781 > URL: https://issues.apache.org/jira/browse/SPARK-16781 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.2 >Reporter: Michael Berman > > When launching spark on a system with multiple javas installed, there are a > few options for choosing which JRE to use, setting `JAVA_HOME` being the most > straightforward. > However, when pyspark's internal py4j launches its JavaGateway, it always > invokes `java` directly, without qualification. This means you get whatever > java's first on your path, which is not necessarily the same one in spark's > JAVA_HOME. > This could be seen as a py4j issue, but from their point of view, the fix is > easy: make sure the java you want is first on your path. I can't figure out a > way to make that reliably happen through the pyspark executor launch path, > and it seems like something that would ideally happen automatically. If I set > JAVA_HOME when launching spark, I would expect that to be the only java used > throughout the stack. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14165) NoSuchElementException: None.get when joining DataFrames with Seq of fields of different case
[ https://issues.apache.org/jira/browse/SPARK-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420541#comment-15420541 ] Jacek Laskowski commented on SPARK-14165: - Thanks [~dongjoon] for looking into it. Yes, the more general join works fine. I think the other example where the join expression is given should be fixed. > NoSuchElementException: None.get when joining DataFrames with Seq of fields > of different case > - > > Key: SPARK-14165 > URL: https://issues.apache.org/jira/browse/SPARK-14165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski >Priority: Minor > > {code} > scala> val left = Seq((1,"a")).toDF("id", "abc") > left: org.apache.spark.sql.DataFrame = [id: int, abc: string] > scala> val right = Seq((1,"a")).toDF("id", "ABC") > right: org.apache.spark.sql.DataFrame = [id: int, ABC: string] > scala> left.join(right, Seq("abc")) > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$commonNaturalJoinProcessing(Analyzer.scala:1444) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1426) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1418) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:67) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:57) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1418) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1417) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:41) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:41) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:58) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2299) > at org.apache.spark.sql.Dataset.join(Dataset.scala:553) > at org.apache.spark.sql.Dataset.join(Dataset.scala:526) > ... 51 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17035) Conversion of datetime.max to microseconds produces incorrect value
[ https://issues.apache.org/jira/browse/SPARK-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420534#comment-15420534 ] Michael Styles commented on SPARK-17035: I have a fix for this issue if you would like to assign the problem to me. On Sat, Aug 13, 2016 at 5:31 PM, Dongjoon Hyun (JIRA)-- Michael Styles Senior Data Platform Engineer Lead Shopify > Conversion of datetime.max to microseconds produces incorrect value > --- > > Key: SPARK-17035 > URL: https://issues.apache.org/jira/browse/SPARK-17035 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.0 >Reporter: Michael Styles >Priority: Minor > > Conversion of datetime.max to microseconds produces incorrect value. For > example, > {noformat} > from datetime import datetime > from pyspark.sql import Row > from pyspark.sql.types import StructType, StructField, TimestampType > schema = StructType([StructField("dt", TimestampType(), False)]) > data = [{"dt": datetime.max}] > # convert python objects to sql data > sql_data = [schema.toInternal(row) for row in data] > # Value is wrong. > sql_data > [(2.534023188e+17,)] > {noformat} > This value should be [(2534023187,)]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14165) NoSuchElementException: None.get when joining DataFrames with Seq of fields of different case
[ https://issues.apache.org/jira/browse/SPARK-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420527#comment-15420527 ] Dongjoon Hyun commented on SPARK-14165: --- I'm wondering if we need to fix the example in your comment. This is still the same. {code} scala> left.join(right, $"abc" === $"ABC") org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could be: abc#6, abc#16.; {code} > NoSuchElementException: None.get when joining DataFrames with Seq of fields > of different case > - > > Key: SPARK-14165 > URL: https://issues.apache.org/jira/browse/SPARK-14165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski >Priority: Minor > > {code} > scala> val left = Seq((1,"a")).toDF("id", "abc") > left: org.apache.spark.sql.DataFrame = [id: int, abc: string] > scala> val right = Seq((1,"a")).toDF("id", "ABC") > right: org.apache.spark.sql.DataFrame = [id: int, ABC: string] > scala> left.join(right, Seq("abc")) > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$commonNaturalJoinProcessing(Analyzer.scala:1444) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1426) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1418) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:67) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:57) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1418) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1417) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:41) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:41) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:58) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2299) > at org.apache.spark.sql.Dataset.join(Dataset.scala:553) > at org.apache.spark.sql.Dataset.join(Dataset.scala:526) > ... 51 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14165) NoSuchElementException: None.get when joining DataFrames with Seq of fields of different case
[ https://issues.apache.org/jira/browse/SPARK-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420525#comment-15420525 ] Dongjoon Hyun commented on SPARK-14165: --- Hi, [~ja...@japila.pl]. Spark 2.0 seems to be released without this problem. {code} scala> val left = Seq((1,"a")).toDF("id", "abc") scala> val right = Seq((1,"a")).toDF("id", "ABC") scala> left.join(right, Seq("abc")).show +---+---+---+ |abc| id| id| +---+---+---+ | a| 1| 1| +---+---+---+ scala> spark.version res1: String = 2.0.0 {code} Could you confirm this? > NoSuchElementException: None.get when joining DataFrames with Seq of fields > of different case > - > > Key: SPARK-14165 > URL: https://issues.apache.org/jira/browse/SPARK-14165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski >Priority: Minor > > {code} > scala> val left = Seq((1,"a")).toDF("id", "abc") > left: org.apache.spark.sql.DataFrame = [id: int, abc: string] > scala> val right = Seq((1,"a")).toDF("id", "ABC") > right: org.apache.spark.sql.DataFrame = [id: int, ABC: string] > scala> left.join(right, Seq("abc")) > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$62.apply(Analyzer.scala:1444) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$commonNaturalJoinProcessing(Analyzer.scala:1444) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1426) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$$anonfun$apply$29.applyOrElse(Analyzer.scala:1418) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:67) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:57) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1418) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin$.apply(Analyzer.scala:1417) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:41) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:41) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:58) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2299) > at org.apache.spark.sql.Dataset.join(Dataset.scala:553) > at org.apache.spark.sql.Dataset.join(Dataset.scala:526) > ... 51 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420515#comment-15420515 ] Dongjoon Hyun commented on SPARK-17053: --- Yep. I closed the PR, too. Thank you for the quick decision, [~rxin]. > Spark ignores hive.exec.drop.ignorenonexistent=true option > -- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-17053. --- Resolution: Won't Fix > Spark ignores hive.exec.drop.ignorenonexistent=true option > -- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420507#comment-15420507 ] Reynold Xin commented on SPARK-17053: - It was by accident that this was supported, because Spark simply used Hive's code to do all DDL operations. In Spark 2.0, Spark implemented all the DDLs natively in Spark, and it is not the intention of the project to support all the esoteric features provided by Hive. > Spark ignores hive.exec.drop.ignorenonexistent=true option > -- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420507#comment-15420507 ] Reynold Xin edited comment on SPARK-17053 at 8/14/16 10:32 PM: --- It was by accident that this was supported, because Spark simply used Hive's code to do all DDL operations. In Spark 2.0, Spark implemented all the DDLs natively in Spark, and it is not the intention of the project to support all the esoteric feature flags provided by Hive. was (Author: rxin): It was by accident that this was supported, because Spark simply used Hive's code to do all DDL operations. In Spark 2.0, Spark implemented all the DDLs natively in Spark, and it is not the intention of the project to support all the esoteric features provided by Hive. > Spark ignores hive.exec.drop.ignorenonexistent=true option > -- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext
[ https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420503#comment-15420503 ] Dongjoon Hyun commented on SPARK-11374: --- Hi [~stephane.maa...@gmail.com], Thank you for comments. Yep. I noticed that option too, but that seems more tricky. The current approach of Spark Scala API and my PR is checking if the partition's file start position is zero. So, it's not straight-forward to apply to footer option. For this issue, I think it could be acceptable since Spark Scala API already supports `header` option. However, for the `footer` option, I think we need a new JIRA issue to get some attention and to build consensus for that option. Thanks, Dongjoon. > skip.header.line.count is ignored in HiveContext > > > Key: SPARK-11374 > URL: https://issues.apache.org/jira/browse/SPARK-11374 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Daniel Haviv > > csv table in Hive which is configured to skip the header row using > TBLPROPERTIES("skip.header.line.count"="1"). > When querying from Hive the header row is not included in the data, but when > running the same query via HiveContext I get the header row. > "show create table " via the HiveContext confirms that it is aware of the > setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext
[ https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420491#comment-15420491 ] Stephane Maarek commented on SPARK-11374: - Hi, Thanks for the PR. Can you also test for the footer option? Might as well solve both issues Thanks Stéphane > skip.header.line.count is ignored in HiveContext > > > Key: SPARK-11374 > URL: https://issues.apache.org/jira/browse/SPARK-11374 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Daniel Haviv > > csv table in Hive which is configured to skip the header row using > TBLPROPERTIES("skip.header.line.count"="1"). > When querying from Hive the header row is not included in the data, but when > running the same query via HiveContext I get the header row. > "show create table " via the HiveContext confirms that it is aware of the > setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext
[ https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420489#comment-15420489 ] Apache Spark commented on SPARK-11374: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/14638 > skip.header.line.count is ignored in HiveContext > > > Key: SPARK-11374 > URL: https://issues.apache.org/jira/browse/SPARK-11374 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Daniel Haviv > > csv table in Hive which is configured to skip the header row using > TBLPROPERTIES("skip.header.line.count"="1"). > When querying from Hive the header row is not included in the data, but when > running the same query via HiveContext I get the header row. > "show create table " via the HiveContext confirms that it is aware of the > setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11374) skip.header.line.count is ignored in HiveContext
[ https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11374: Assignee: (was: Apache Spark) > skip.header.line.count is ignored in HiveContext > > > Key: SPARK-11374 > URL: https://issues.apache.org/jira/browse/SPARK-11374 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Daniel Haviv > > csv table in Hive which is configured to skip the header row using > TBLPROPERTIES("skip.header.line.count"="1"). > When querying from Hive the header row is not included in the data, but when > running the same query via HiveContext I get the header row. > "show create table " via the HiveContext confirms that it is aware of the > setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11374) skip.header.line.count is ignored in HiveContext
[ https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11374: Assignee: Apache Spark > skip.header.line.count is ignored in HiveContext > > > Key: SPARK-11374 > URL: https://issues.apache.org/jira/browse/SPARK-11374 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Daniel Haviv >Assignee: Apache Spark > > csv table in Hive which is configured to skip the header row using > TBLPROPERTIES("skip.header.line.count"="1"). > When querying from Hive the header row is not included in the data, but when > running the same query via HiveContext I get the header row. > "show create table " via the HiveContext confirms that it is aware of the > setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2243) Support multiple SparkContexts in the same JVM
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420467#comment-15420467 ] Stephen Boesch commented on SPARK-2243: --- Given this were not going to be fixed: please update documentation and fix the following warning: WARN SparkContext: Multiple running SparkContexts detected in the same JVM! org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true > Support multiple SparkContexts in the same JVM > -- > > Key: SPARK-2243 > URL: https://issues.apache.org/jira/browse/SPARK-2243 > Project: Spark > Issue Type: New Feature > Components: Block Manager, Spark Core >Affects Versions: 0.7.0, 1.0.0, 1.1.0 >Reporter: Miguel Angel Fernandez Diaz > > We're developing a platform where we create several Spark contexts for > carrying out different calculations. Is there any restriction when using > several Spark contexts? We have two contexts, one for Spark calculations and > another one for Spark Streaming jobs. The next error arises when we first > execute a Spark calculation and, once the execution is finished, a Spark > Streaming job is launched: > {code} > 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0 > java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) > at > org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) > at > org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) > at > org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) > at > org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) > at > java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0) > 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Loss was due to > java.io.FileNotFoundException > java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) > at > org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) > at > org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at >
[jira] [Commented] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420456#comment-15420456 ] Dongjoon Hyun commented on SPARK-17053: --- Oh, I see! > Spark ignores hive.exec.drop.ignorenonexistent=true option > -- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gokhan Civan updated SPARK-17053: - Summary: Spark ignores hive.exec.drop.ignorenonexistent=true option (was: DROP statement should not require IF EXISTS) > Spark ignores hive.exec.drop.ignorenonexistent=true option > -- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17053: Assignee: (was: Apache Spark) > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17053: Assignee: Apache Spark > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan >Assignee: Apache Spark > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420443#comment-15420443 ] Qifan Pu commented on SPARK-17053: -- [~dongjoon]sorry, it was a accident click. > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420430#comment-15420430 ] Dongjoon Hyun edited comment on SPARK-17053 at 8/14/16 6:32 PM: Oh, I mentioned wrong username. Hi, [~qifan]. Could you leave some comment for the audit when you close issue as 'WON'T FIX'? was (Author: dongjoon): Oh, I mentioned wrong username. Hi, [~qifan]. Could you leave some comment when you close issue as 'WON'T FIX' for the audit? > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420430#comment-15420430 ] Dongjoon Hyun commented on SPARK-17053: --- Oh, I mentioned wrong username. Hi, [~qifan]. Could you leave some comment when you close issue as 'WON'T FIX' for the audit? > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reopened SPARK-17053: --- Hi, [~superpanpan]. Could you leave some comment why you close this issue with `WON'T FIX`? > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16967) Collect Mesos support code into a module/profile
[ https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16967: Assignee: (was: Apache Spark) > Collect Mesos support code into a module/profile > > > Key: SPARK-16967 > URL: https://issues.apache.org/jira/browse/SPARK-16967 > Project: Spark > Issue Type: Task > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Sean Owen >Priority: Critical > > CC [~mgummelt] [~tnachen] [~skonto] > I think this is fairly easy and would be beneficial as more work goes into > Mesos. It should separate into a module like YARN does, just on principle > really, but because it also means anyone that doesn't need Mesos support can > build without it. > I'm entirely willing to take a shot at this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16967) Collect Mesos support code into a module/profile
[ https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420425#comment-15420425 ] Apache Spark commented on SPARK-16967: -- User 'mgummelt' has created a pull request for this issue: https://github.com/apache/spark/pull/14637 > Collect Mesos support code into a module/profile > > > Key: SPARK-16967 > URL: https://issues.apache.org/jira/browse/SPARK-16967 > Project: Spark > Issue Type: Task > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Sean Owen >Priority: Critical > > CC [~mgummelt] [~tnachen] [~skonto] > I think this is fairly easy and would be beneficial as more work goes into > Mesos. It should separate into a module like YARN does, just on principle > really, but because it also means anyone that doesn't need Mesos support can > build without it. > I'm entirely willing to take a shot at this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16967) Collect Mesos support code into a module/profile
[ https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16967: Assignee: Apache Spark > Collect Mesos support code into a module/profile > > > Key: SPARK-16967 > URL: https://issues.apache.org/jira/browse/SPARK-16967 > Project: Spark > Issue Type: Task > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Sean Owen >Assignee: Apache Spark >Priority: Critical > > CC [~mgummelt] [~tnachen] [~skonto] > I think this is fairly easy and would be beneficial as more work goes into > Mesos. It should separate into a module like YARN does, just on principle > really, but because it also means anyone that doesn't need Mesos support can > build without it. > I'm entirely willing to take a shot at this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qifan Pu resolved SPARK-17053. -- Resolution: Won't Fix > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17053: Assignee: (was: Apache Spark) > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420421#comment-15420421 ] Apache Spark commented on SPARK-17053: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/14636 > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17053: Assignee: Apache Spark > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan >Assignee: Apache Spark > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420405#comment-15420405 ] Dongjoon Hyun commented on SPARK-17053: --- Oh, I see what you mean. Yes, indeed. Currently, Spark ignores the option. {code} scala> sql("set hive.exec.drop.ignorenonexistent=true") res1: org.apache.spark.sql.DataFrame = [key: string, value: string] scala> sql("drop table a") org.apache.spark.sql.AnalysisException: Table to drop '`a`' does not exist; {code} I'll make a PR soon. Could you update the title more specifically? > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420398#comment-15420398 ] Gokhan Civan commented on SPARK-17053: -- I saw a lot of code base that used DROP without IF EXISTS, and did find it strange. I guess there was an ignorenonexistent variable lurking around somewhere. So this variable was somehow built into 1.6.1 and later removed? > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17053) DROP statement should not require IF EXISTS
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420392#comment-15420392 ] Sean Owen commented on SPARK-17053: --- That would imply that "IF EXISTS" is redundant, that the behavior is always "IF EXISTS". The Hive docs also disagree: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL In Hive 0.7.0 or later, DROP returns an error if the table doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true. > DROP statement should not require IF EXISTS > --- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17053) DROP statement should not require IF EXISTS
Gokhan Civan created SPARK-17053: Summary: DROP statement should not require IF EXISTS Key: SPARK-17053 URL: https://issues.apache.org/jira/browse/SPARK-17053 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Gokhan Civan In version 1.6.1, the following does not throw an exception: create table a as select 1; drop table a; drop table a; In version 2.0.0, the second drop fails; this is not compatible with Hive. The same problem exists for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17052) Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala
[ https://issues.apache.org/jira/browse/SPARK-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17052: Assignee: (was: Apache Spark) > Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala > --- > > Key: SPARK-17052 > URL: https://issues.apache.org/jira/browse/SPARK-17052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Priority: Minor > > The original [JIRA > Hive-1642](https://issues.apache.org/jira/browse/HIVE-1642) delivered the > test cases `auto_joinXYZ` for verifying the results when the joins are > automatically converted to map-join. Basically, most of them are just copied > from the corresponding `joinXYZ`. > After comparison between `auto_joinXYZ` and `joinXYZ`, below is a list of > duplicate cases: > {noformat} > "auto_join0", > "auto_join1", > "auto_join10", > "auto_join11", > "auto_join12", > "auto_join13", > "auto_join14", > "auto_join14_hadoop20", > "auto_join15", > "auto_join17", > "auto_join18", > "auto_join2", > "auto_join20", > "auto_join21", > "auto_join23", > "auto_join24", > "auto_join3", > "auto_join4", > "auto_join5", > "auto_join6", > "auto_join7", > "auto_join8", > "auto_join9" > {noformat} > We can remove all of them without affecting the test coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17052) Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala
[ https://issues.apache.org/jira/browse/SPARK-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17052: Assignee: Apache Spark > Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala > --- > > Key: SPARK-17052 > URL: https://issues.apache.org/jira/browse/SPARK-17052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Apache Spark >Priority: Minor > > The original [JIRA > Hive-1642](https://issues.apache.org/jira/browse/HIVE-1642) delivered the > test cases `auto_joinXYZ` for verifying the results when the joins are > automatically converted to map-join. Basically, most of them are just copied > from the corresponding `joinXYZ`. > After comparison between `auto_joinXYZ` and `joinXYZ`, below is a list of > duplicate cases: > {noformat} > "auto_join0", > "auto_join1", > "auto_join10", > "auto_join11", > "auto_join12", > "auto_join13", > "auto_join14", > "auto_join14_hadoop20", > "auto_join15", > "auto_join17", > "auto_join18", > "auto_join2", > "auto_join20", > "auto_join21", > "auto_join23", > "auto_join24", > "auto_join3", > "auto_join4", > "auto_join5", > "auto_join6", > "auto_join7", > "auto_join8", > "auto_join9" > {noformat} > We can remove all of them without affecting the test coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17052) Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala
[ https://issues.apache.org/jira/browse/SPARK-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420385#comment-15420385 ] Apache Spark commented on SPARK-17052: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/14635 > Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala > --- > > Key: SPARK-17052 > URL: https://issues.apache.org/jira/browse/SPARK-17052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Priority: Minor > > The original [JIRA > Hive-1642](https://issues.apache.org/jira/browse/HIVE-1642) delivered the > test cases `auto_joinXYZ` for verifying the results when the joins are > automatically converted to map-join. Basically, most of them are just copied > from the corresponding `joinXYZ`. > After comparison between `auto_joinXYZ` and `joinXYZ`, below is a list of > duplicate cases: > {noformat} > "auto_join0", > "auto_join1", > "auto_join10", > "auto_join11", > "auto_join12", > "auto_join13", > "auto_join14", > "auto_join14_hadoop20", > "auto_join15", > "auto_join17", > "auto_join18", > "auto_join2", > "auto_join20", > "auto_join21", > "auto_join23", > "auto_join24", > "auto_join3", > "auto_join4", > "auto_join5", > "auto_join6", > "auto_join7", > "auto_join8", > "auto_join9" > {noformat} > We can remove all of them without affecting the test coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17052) Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala
[ https://issues.apache.org/jira/browse/SPARK-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-17052: Priority: Minor (was: Major) > Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala > --- > > Key: SPARK-17052 > URL: https://issues.apache.org/jira/browse/SPARK-17052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Priority: Minor > > The original [JIRA > Hive-1642](https://issues.apache.org/jira/browse/HIVE-1642) delivered the > test cases `auto_joinXYZ` for verifying the results when the joins are > automatically converted to map-join. Basically, most of them are just copied > from the corresponding `joinXYZ`. > After comparison between `auto_joinXYZ` and `joinXYZ`, below is a list of > duplicate cases: > {noformat} > "auto_join0", > "auto_join1", > "auto_join10", > "auto_join11", > "auto_join12", > "auto_join13", > "auto_join14", > "auto_join14_hadoop20", > "auto_join15", > "auto_join17", > "auto_join18", > "auto_join2", > "auto_join20", > "auto_join21", > "auto_join23", > "auto_join24", > "auto_join3", > "auto_join4", > "auto_join5", > "auto_join6", > "auto_join7", > "auto_join8", > "auto_join9" > {noformat} > We can remove all of them without affecting the test coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17052) Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala
Xiao Li created SPARK-17052: --- Summary: Remove Duplicate Test Cases auto_join from HiveCompatibilitySuite.scala Key: SPARK-17052 URL: https://issues.apache.org/jira/browse/SPARK-17052 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Xiao Li The original [JIRA Hive-1642](https://issues.apache.org/jira/browse/HIVE-1642) delivered the test cases `auto_joinXYZ` for verifying the results when the joins are automatically converted to map-join. Basically, most of them are just copied from the corresponding `joinXYZ`. After comparison between `auto_joinXYZ` and `joinXYZ`, below is a list of duplicate cases: {noformat} "auto_join0", "auto_join1", "auto_join10", "auto_join11", "auto_join12", "auto_join13", "auto_join14", "auto_join14_hadoop20", "auto_join15", "auto_join17", "auto_join18", "auto_join2", "auto_join20", "auto_join21", "auto_join23", "auto_join24", "auto_join3", "auto_join4", "auto_join5", "auto_join6", "auto_join7", "auto_join8", "auto_join9" {noformat} We can remove all of them without affecting the test coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17051) we should use hadoopConf in InsertIntoHiveTable
[ https://issues.apache.org/jira/browse/SPARK-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420350#comment-15420350 ] Apache Spark commented on SPARK-17051: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/14634 > we should use hadoopConf in InsertIntoHiveTable > --- > > Key: SPARK-17051 > URL: https://issues.apache.org/jira/browse/SPARK-17051 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17051) we should use hadoopConf in InsertIntoHiveTable
[ https://issues.apache.org/jira/browse/SPARK-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17051: Assignee: Wenchen Fan (was: Apache Spark) > we should use hadoopConf in InsertIntoHiveTable > --- > > Key: SPARK-17051 > URL: https://issues.apache.org/jira/browse/SPARK-17051 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17051) we should use hadoopConf in InsertIntoHiveTable
[ https://issues.apache.org/jira/browse/SPARK-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17051: Assignee: Apache Spark (was: Wenchen Fan) > we should use hadoopConf in InsertIntoHiveTable > --- > > Key: SPARK-17051 > URL: https://issues.apache.org/jira/browse/SPARK-17051 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17051) we should use hadoopConf in InsertIntoHiveTable
Wenchen Fan created SPARK-17051: --- Summary: we should use hadoopConf in InsertIntoHiveTable Key: SPARK-17051 URL: https://issues.apache.org/jira/browse/SPARK-17051 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17050) Improve initKMeansParallel with treeAggregate
[ https://issues.apache.org/jira/browse/SPARK-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420336#comment-15420336 ] Sean Owen commented on SPARK-17050: --- [~WeichenXu123] this sounds quite related to https://issues.apache.org/jira/browse/SPARK-17033 . I don't think we should open JIRAs for the exact same logical change in different classes. Let's put these together into SPARK-17033. > Improve initKMeansParallel with treeAggregate > - > > Key: SPARK-17050 > URL: https://issues.apache.org/jira/browse/SPARK-17050 > Project: Spark > Issue Type: Improvement >Reporter: Weichen Xu > Original Estimate: 24h > Remaining Estimate: 24h > > The `initKMeansParallel` use `rdd.aggregate`, it is better to use > `treeAggregate` to get better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17050) Improve initKMeansParallel with treeAggregate
[ https://issues.apache.org/jira/browse/SPARK-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17050: Assignee: Apache Spark > Improve initKMeansParallel with treeAggregate > - > > Key: SPARK-17050 > URL: https://issues.apache.org/jira/browse/SPARK-17050 > Project: Spark > Issue Type: Improvement >Reporter: Weichen Xu >Assignee: Apache Spark > Original Estimate: 24h > Remaining Estimate: 24h > > The `initKMeansParallel` use `rdd.aggregate`, it is better to use > `treeAggregate` to get better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17050) Improve initKMeansParallel with treeAggregate
[ https://issues.apache.org/jira/browse/SPARK-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17050: Assignee: (was: Apache Spark) > Improve initKMeansParallel with treeAggregate > - > > Key: SPARK-17050 > URL: https://issues.apache.org/jira/browse/SPARK-17050 > Project: Spark > Issue Type: Improvement >Reporter: Weichen Xu > Original Estimate: 24h > Remaining Estimate: 24h > > The `initKMeansParallel` use `rdd.aggregate`, it is better to use > `treeAggregate` to get better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17050) Improve initKMeansParallel with treeAggregate
[ https://issues.apache.org/jira/browse/SPARK-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420325#comment-15420325 ] Apache Spark commented on SPARK-17050: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/14628 > Improve initKMeansParallel with treeAggregate > - > > Key: SPARK-17050 > URL: https://issues.apache.org/jira/browse/SPARK-17050 > Project: Spark > Issue Type: Improvement >Reporter: Weichen Xu > Original Estimate: 24h > Remaining Estimate: 24h > > The `initKMeansParallel` use `rdd.aggregate`, it is better to use > `treeAggregate` to get better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17050) Improve initKMeansParallel with treeAggregate
Weichen Xu created SPARK-17050: -- Summary: Improve initKMeansParallel with treeAggregate Key: SPARK-17050 URL: https://issues.apache.org/jira/browse/SPARK-17050 Project: Spark Issue Type: Improvement Reporter: Weichen Xu The `initKMeansParallel` use `rdd.aggregate`, it is better to use `treeAggregate` to get better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-6273) Got error when one table's alias name is the same with other table's column name
[ https://issues.apache.org/jira/browse/SPARK-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-6273: -- > Got error when one table's alias name is the same with other table's column > name > > > Key: SPARK-6273 > URL: https://issues.apache.org/jira/browse/SPARK-6273 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.1, 1.3.1 >Reporter: Jeff > > while one table's alias name is the same with other table's column name > get the error Ambiguous references > {code} > Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Ambiguous references to salary.pay_date: > (pay_date#34749,List()),(salary#34792,List(pay_date)), tree: > 'Filter 'salary.pay_date = 'time_by_day.the_date) && > ('time_by_day.the_year = 1997.0)) && ('salary.employee_id = > 'employee.employee_id)) && ('employee.store_id = 'store.store_id)) > Join Inner, None > Join Inner, None >Join Inner, None > MetastoreRelation yxqtest, time_by_day, Some(time_by_day) > MetastoreRelation yxqtest, salary, Some(salary) >MetastoreRelation yxqtest, store, Some(store) > MetastoreRelation yxqtest, employee, Some(employee) (state=,code=0) > Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Ambiguous references to salary.pay_date: > (pay_date#34749,List()),(salary#34792,List(pay_date)), tree: > 'Filter 'salary.pay_date = 'time_by_day.the_date) && > ('time_by_day.the_year = 1997.0)) && ('salary.employee_id = > 'employee.employee_id)) && ('employee.store_id = 'store.store_id)) > Join Inner, None > Join Inner, None >Join Inner, None > MetastoreRelation yxqtest, time_by_day, Some(time_by_day) > MetastoreRelation yxqtest, salary, Some(salary) >MetastoreRelation yxqtest, store, Some(store) > MetastoreRelation yxqtest, employee, Some(employee) (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6273) Got error when one table's alias name is the same with other table's column name
[ https://issues.apache.org/jira/browse/SPARK-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-6273. -- Resolution: Duplicate Fix Version/s: (was: 2.0.0) > Got error when one table's alias name is the same with other table's column > name > > > Key: SPARK-6273 > URL: https://issues.apache.org/jira/browse/SPARK-6273 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.1, 1.3.1 >Reporter: Jeff > > while one table's alias name is the same with other table's column name > get the error Ambiguous references > {code} > Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Ambiguous references to salary.pay_date: > (pay_date#34749,List()),(salary#34792,List(pay_date)), tree: > 'Filter 'salary.pay_date = 'time_by_day.the_date) && > ('time_by_day.the_year = 1997.0)) && ('salary.employee_id = > 'employee.employee_id)) && ('employee.store_id = 'store.store_id)) > Join Inner, None > Join Inner, None >Join Inner, None > MetastoreRelation yxqtest, time_by_day, Some(time_by_day) > MetastoreRelation yxqtest, salary, Some(salary) >MetastoreRelation yxqtest, store, Some(store) > MetastoreRelation yxqtest, employee, Some(employee) (state=,code=0) > Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Ambiguous references to salary.pay_date: > (pay_date#34749,List()),(salary#34792,List(pay_date)), tree: > 'Filter 'salary.pay_date = 'time_by_day.the_date) && > ('time_by_day.the_year = 1997.0)) && ('salary.employee_id = > 'employee.employee_id)) && ('employee.store_id = 'store.store_id)) > Join Inner, None > Join Inner, None >Join Inner, None > MetastoreRelation yxqtest, time_by_day, Some(time_by_day) > MetastoreRelation yxqtest, salary, Some(salary) >MetastoreRelation yxqtest, store, Some(store) > MetastoreRelation yxqtest, employee, Some(employee) (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17027) PolynomialExpansion.choose is prone to integer overflow
[ https://issues.apache.org/jira/browse/SPARK-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17027: -- Fix Version/s: (was: 1.6.3) > PolynomialExpansion.choose is prone to integer overflow > > > Key: SPARK-17027 > URL: https://issues.apache.org/jira/browse/SPARK-17027 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.0, 2.0.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > Fix For: 2.0.1, 2.1.0 > > > Current implementation computes power of k directly and because of that it is > susceptible to integer overflow on relatively small input (4 features, degree > equal 10). It would be better to use recursive formula instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16843) Select features according to a percentile of the highest scores of ChiSqSelector
[ https://issues.apache.org/jira/browse/SPARK-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16843. --- Resolution: Duplicate I'd like to call this a subset of the more general changes for chi-squared selection proposed in SPARK-17017 (see PR for more detail: https://github.com/apache/spark/pull/14597 ) > Select features according to a percentile of the highest scores of > ChiSqSelector > > > Key: SPARK-16843 > URL: https://issues.apache.org/jira/browse/SPARK-16843 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Peng Meng >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > It would be handy to add a percentile Param to ChiSqSelector, as in the > scikit-learn one: > http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectPercentile.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16885) Spark shell failed to run in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16885. --- Resolution: Not A Problem > Spark shell failed to run in yarn-client mode > - > > Key: SPARK-16885 > URL: https://issues.apache.org/jira/browse/SPARK-16885 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.0 > Environment: Ubuntu 12.04 > Hadoop 2.7.2 + Yarn >Reporter: Yury Zhyshko > Attachments: spark-env.sh > > > I've installed Hadoop + Yarn in pseudo distributed mode following these > instructions: > https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_a_Single_Node > After that I downloaded and installed a prebuild Spark for Hadoop 2.7 > The command that I used to run a shell: > ./bin/spark-shell --master yarn --deploy-mode client --conf > spark.yarn.archive=/home/yzhishko/work/spark/jars > Here is the error: > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). > 16/08/03 17:13:50 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 16/08/03 17:13:52 ERROR spark.SparkContext: Error initializing SparkContext. > java.lang.IllegalArgumentException: Can not create a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) > at org.apache.hadoop.fs.Path.(Path.java:134) > at org.apache.hadoop.fs.Path.(Path.java:93) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:338) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:433) > at > org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:472) > at > org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834) > at > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149) > at org.apache.spark.SparkContext.(SparkContext.scala:500) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) > at org.apache.spark.repl.Main$.createSparkSession(Main.scala:101) > at $line3.$read$$iw$$iw.(:15) > at $line3.$read$$iw.(:31) > at $line3.$read.(:33) > at $line3.$read$.(:37) > at $line3.$read$.() > at $line3.$eval$.$print$lzycompute(:7) > at $line3.$eval$.$print(:6) > at $line3.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786) > at > scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047) > at > scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638) > at > scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637) > at > scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) > at > scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) > at > scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) > at > scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807) > at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681) > at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395) > at > org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38) > at > org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37) > at > org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37) > at
[jira] [Updated] (SPARK-17027) PolynomialExpansion.choose is prone to integer overflow
[ https://issues.apache.org/jira/browse/SPARK-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17027: -- Assignee: Maciej Szymkiewicz > PolynomialExpansion.choose is prone to integer overflow > > > Key: SPARK-17027 > URL: https://issues.apache.org/jira/browse/SPARK-17027 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.0, 2.0.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > Fix For: 1.6.3, 2.0.1, 2.1.0 > > > Current implementation computes power of k directly and because of that it is > susceptible to integer overflow on relatively small input (4 features, degree > equal 10). It would be better to use recursive formula instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17027) PolynomialExpansion.choose is prone to integer overflow
[ https://issues.apache.org/jira/browse/SPARK-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17027. --- Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 1.6.3 Issue resolved by pull request 14614 [https://github.com/apache/spark/pull/14614] > PolynomialExpansion.choose is prone to integer overflow > > > Key: SPARK-17027 > URL: https://issues.apache.org/jira/browse/SPARK-17027 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.0, 2.0.0 >Reporter: Maciej Szymkiewicz >Priority: Minor > Fix For: 1.6.3, 2.0.1, 2.1.0 > > > Current implementation computes power of k directly and because of that it is > susceptible to integer overflow on relatively small input (4 features, degree > equal 10). It would be better to use recursive formula instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org