I’ve created a jira issue for this
https://issues.apache.org/jira/browse/SPARK-4967
Originally we want to support multiple parquet file paths scanning as I guess,
and those file paths are in a single string separated by comma internally,
however I didn’t find any public example says we support
Hi Kevin,
Were you able to build spark with command export MAVEN_OPTS=-Xmx2g
-XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m mvn -Pdeb
-DskipTests clean package ?
I am getting the below error for all versions of spark(even 1.2.0):
Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on
Hi, I think a modeling tool may be helpful because sometimes it's
hard/tricky to program Spark. I don't know if there is already such a
tool.
Thanks!
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional
Hi Guys,
I found an excetpion while running application using 1.2.0-snapshot version.
It shows like this:
2014-12-23 07:45:36,333 | ERROR | [Executor task launch worker-0] |
Exception in task 0.0 in stage 0.0 (TID 0) |
org.apache.spark.Logging$class.logError(Logging.scala:96)
Nice idea, although it needs a plan on their hosting, or spark to host it
if I'm not wrong.
I've been using Slack for discussions, it's not exactly the same of
discourse, the ML or SO but offers interesting features.
It's more in the mood of IRC integrated with external services.
my2c
On Wed
Hi,
I got some issues with mapPartitions with the following piece of code:
val sessions = sc
.newAPIHadoopFile(
... path to an avro file ...,
classOf[org.apache.avro.mapreduce.AvroKeyInputFormat[ByteBuffer]],
classOf[AvroKey[ByteBuffer]],
Spark 1.2.0 is SO much more usable than previous releases -- many thanks to
the team for this release.
A question about progress of actions. I can see how things are progressing
using the Spark UI. I can also see the nice ASCII art animation on the
spark driver console.
Has anyone come up with
Thanks. I marked the variable as transient and i moved ahead now i am getting
exception in execution the query.
final static transient SparkConf sparkConf = new
SparkConf().setAppName(NumberCount);final static transient
JavaSparkContext jc = new JavaSparkContext(sparkConf);static
Hi All,
I am new to both Scala Spark, so please expect some mistakes.
Setup :
Scala : 2.10.2
Spark : Apache 1.1.0
Hadoop : Apache 2.4
Intend of the code : To read from kafka topic do some processing.
Below are the code details and error am getting. :
import org.apache.spark._
import
Sorry for the typo.
Apache Hadoop version is 2.6.0
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ReliableDeliverySupervisor-Association-with-remote-system-tp20859p20860.html
Sent from the Apache Spark User List mailing list archive at
Hi Users,
I am reading a csv file and my data format is like :
key1,value1
key1,value2
key1,value1
key1,value3
key2,value1
key2,value5
key2,value5
key2,value4
key1,value4
key1,value4
key3,value1
key3,value1
key3,value2
required output :
key1:[value1,value2,value1,value3,value4,value4]
Hello all - can anyone please offer any advice on this issue?
-Ilya Ganelin
On Mon, Dec 22, 2014 at 5:36 PM, Ganelin, Ilya ilya.gane...@capitalone.com
wrote:
Hi all, I have a long running job iterating over a huge dataset. Parts of
this operation are cached. Since the job runs for so long,
The following command works
./make-distribution.sh --tgz -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive -DskipTests
-- Original --
From: guxiaobo1982;guxiaobo1...@qq.com;
Send time: Thursday, Dec 25, 2014 3:58 PM
To:
Hi,
On Fri, Dec 26, 2014 at 5:22 AM, Amit Behera amit.bd...@gmail.com wrote:
How can I do it? Please help me to do.
Have you considered using groupByKey?
http://spark.apache.org/docs/latest/programming-guide.html#transformations
Tobias
Hi,
On Fri, Dec 26, 2014 at 1:32 AM, ey-chih chow eyc...@hotmail.com wrote:
I got some issues with mapPartitions with the following piece of code:
val sessions = sc
.newAPIHadoopFile(
... path to an avro file ...,
I should rephrase my question as follows:
How to use the corresponding Hadoop Configuration of a HadoopRDD in defining
a function as an input parameter to the MapPartitions function?
Thanks.
Ey-Chih Chow
--
View this message in context:
Nick,
uh, I would have expected a rather heated discussion, but the opposite
seems to be the case ;-)
Independent of my personal preferences w.r.t. usability, habits etc., I
think it is not good for a software/tool/framework if questions and
discussions are spread over too many places. I guess
Hi,
On Fri, Dec 26, 2014 at 10:13 AM, ey-chih chow eyc...@hotmail.com wrote:
I should rephrase my question as follows:
How to use the corresponding Hadoop Configuration of a HadoopRDD in
defining
a function as an input parameter to the MapPartitions function?
Well, you could try to pull
Hello All,
I'm a newbie to Spark and Cassandra. I try to run the spark demo within
dse-cassandra Portfoliodemo in a cluster env but cannot succeed.
This issue may not really coming from spark, but I am really not sure how
to investigate more on this. Please help me.
There are 5 centos servers
Hi,
Hadoop Configuration is only Writable, not Java Serializable. You can use
SerializableWritable (in Spark) to wrap the Configuration to make it
serializable, and use broadcast variable to broadcast this conf to all the
node, then you can use it in mapPartitions, rather than serialize it
Hi ,
You can try reducebyKey also ,
Something like this
JavaPairRDDString, String ones = lines
.mapToPair(new PairFunctionString, String,
String() {
@Override
public Tuple2String, String call(String s)
Hi,
I want to find the time taken for replicating an rdd in spark cluster along
with the computation time on the replicated rdd.
Can someone please suggest some ideas?
Thank you
Hi,
Say I have created a clustering model using KMeans for 100million
transactions at time t1. I am using streaming and say for every 1 hour i
need to update my existing model. How do I do it. Should it include every
time all the data or can it be incrementally updated.
If I can do an
Can you cross check your cassandra-rackdc.properties,
cassandra-topology.properties files? It could be a miss configuration. Also
its better you look at the cassandra logs to see whats happening internally.
Thanks
Best Regards
On Fri, Dec 26, 2014 at 7:23 AM, Zhang Jiaqiang
We have a mirror of the user and developer mailing lists on Nabble, but
unfortunately this has led to significant usability issues because users
may attempt to post messages through Nabble which silently fail to get
posted to the actual Apache list and thus are never read by most
subscribers:
25 matches
Mail list logo