Re: Are tachyon and akka removed from 2.1.1 please

2017-05-23 Thread ??????????
357...@qq.com>;"Gene Pang"<gene.p...@gmail.com>; Subject: Re: Are tachyon and akka removed from 2.1.1 please Akka has been replaced by netty in 1.6 Le 22 mai 2017 15:25, "Chin Wei Low" <lowchin...@gmail.com> a ??crit : I think akka has been removed since 2.0.

Re: Are tachyon and akka removed from 2.1.1 please

2017-05-23 Thread ??????????
thanks Gene. ---Original--- From: "Gene Pang"<gene.p...@gmail.com> Date: 2017/5/22 22:19:47 To: "??"<1427357...@qq.com>; Cc: "user"<user@spark.apache.org>; Subject: Re: Are tachyon and akka removed from 2.1.1 please Hi, Tachyon has be

Re: Are tachyon and akka removed from 2.1.1 please

2017-05-22 Thread vincent gromakowski
Akka has been replaced by netty in 1.6 Le 22 mai 2017 15:25, "Chin Wei Low" <lowchin...@gmail.com> a écrit : > I think akka has been removed since 2.0. > > On 22 May 2017 10:19 pm, "Gene Pang" <gene.p...@gmail.com> wrote: > >> H

Re: Are tachyon and akka removed from 2.1.1 please

2017-05-22 Thread Chin Wei Low
I think akka has been removed since 2.0. On 22 May 2017 10:19 pm, "Gene Pang" <gene.p...@gmail.com> wrote: > Hi, > > Tachyon has been renamed to Alluxio. Here is the documentation for > running Alluxio with Spark > <http://www.alluxio.org/docs/master/en/Runnin

Re: Are tachyon and akka removed from 2.1.1 please

2017-05-22 Thread Gene Pang
Hi, Tachyon has been renamed to Alluxio. Here is the documentation for running Alluxio with Spark <http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html>. Hope this helps, Gene On Sun, May 21, 2017 at 6:15 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > HI all, > Irea

Are tachyon and akka removed from 2.1.1 please

2017-05-21 Thread ??????????
HI all, Iread some paper about source code, the paper base on version 1.2. they refer the tachyon and akka. When i read the 2.1code. I can not find the code abiut akka and tachyon. Are tachyon and akka removed from 2.1.1 please

Re: off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread Bin Fan
Hi, If you are looking for how to run Spark on Alluxio (formerly Tachyon), here is the documentation from Alluxio doc site: http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html It still works for Spark 2.x. Alluxio team also published articles on when and why running Spark (2.x

Re: off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread Sean Owen
...@gmail.com> wrote: > Here is my understanding. > > Spark used Tachyon as an off-heap solution for RDDs. In certain situations, > it would alleviate Garbage Collection or the RDDs. > > Tungsten, Spark 2’s off-heap (columnar format) is much more efficient and > used as the de

Re: off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread Richard Catlin
Here is my understanding. Spark used Tachyon as an off-heap solution for RDDs. In certain situations, it would alleviate Garbage Collection or the RDDs. Tungsten, Spark 2’s off-heap (columnar format) is much more efficient and used as the default. Alluvio no longer makes sense for this use

off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread aka.fe2s
Hi folks, What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention it no longer. -- Oleksiy Dyagilev

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-02-01 Thread Jia Zou
Hi, Calvin, I am running 24GB data Spark KMeans in a c3.2xlarge AWS instance with 30GB physical memory. Spark will cache data off-heap to Tachyon, the input data is also stored in Tachyon. Tachyon is configured to use 15GB memory, and use tired store. Tachyon underFS is /tmp. The only

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-29 Thread cc
Hey, Jia Zou I'm curious about this exception, the error log you showed that the exception is related to unlockBlock, could you upload your full master.log and worker.log under tachyon/logs directory? Best, Cheng 在 2016年1月29日星期五 UTC+8上午11:11:19,Calvin Jia写道: > > Hi, >

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-28 Thread Calvin Jia
Hi, Thanks for the detailed information. How large is the dataset you are running against? Also did you change any Tachyon configurations? Thanks, Calvin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
Dears, I keep getting below exception when using Spark 1.6.0 on top of Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH. Any suggestions will be appreciated, thanks! = Exception in thread "main" org.apache.spark.SparkException: J

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
BTW. The tachyon worker log says following: 2015-12-27 01:33:44,599 ERROR WORKER_LOGGER (WorkerBlockMasterClient.java:getId) - java.net.SocketException: Connection reset org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Wed, Jan 27, 2016 at 5:53 AM, Jia Zou <jacqueline...@gmail.com> wrote: > BTW. The tachyon worker log says

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
BTW. the error happens when configure Spark to read input file from Tachyon like following: /home/ubuntu/spark-1.6.0/bin/spark-submit --properties-file /home/ubuntu/HiBench/report/kmeans/spark/java/conf/sparkbench/spark.conf --class org.apache.spark.examples.mllib.JavaKMeans --master spark://ip

Re: How to query data in tachyon with spark-sql

2016-01-24 Thread Gene Pang
Hi, You should be able to point Hive to Tachyon instead of HDFS, and that should allow Hive to access data in Tachyon. If Spark SQL was pointing to an HDFS file, you could instead point it to a Tachyon file, and that should work too. Hope that helps, Gene On Wed, Jan 20, 2016 at 2:06 AM, Sea

How to query data in tachyon with spark-sql

2016-01-20 Thread Sea
Hi,all I want to mount some hive table in tachyon, but I don't know how to query data in tachyon with spark-sql, who knows?

Re: Saving RDDs in Tachyon

2015-12-09 Thread Calvin Jia
Hi Mark, Were you able to successfully store the RDD with Akhil's method? When you read it back as an objectFile, you will also need to specify the correct type. You can find more information about integrating Spark and Tachyon on this page: http://tachyon-project.org/documentation/Running-Spark

how often you use Tachyon to accelerate Spark

2015-12-06 Thread Arvin
Hi,all Well, I have some questions about Tachyon and Spark. I found the interactive between Spark and Tachyon is the caching RDD use off-heap. I wonder if you guys use Tachyon frequently, such caching RDD by Tachyon? Is this action(caching rdd by tachyon) has a profound effect to accelerate Spark

Re: Saving RDDs in Tachyon

2015-10-30 Thread Akhil Das
I guess you can do a .saveAsObjectFiles and read it back as sc.objectFile Thanks Best Regards On Fri, Oct 23, 2015 at 7:57 AM, mark <manwoodv...@googlemail.com> wrote: > I have Avro records stored in Parquet files in HDFS. I want to read these > out as an RDD and save that RD

Re: How does Spark coordinate with Tachyon wrt data locality

2015-10-23 Thread Calvin Jia
Hi Shane, Tachyon provides an api to get the block locations of the file which Spark uses when scheduling tasks. Hope this helps, Calvin On Fri, Oct 23, 2015 at 8:15 AM, Kinsella, Shane <shane.kinse...@aspect.com> wrote: > Hi all, > > > > I am looking into how Spark hand

How does Spark coordinate with Tachyon wrt data locality

2015-10-23 Thread Kinsella, Shane
Hi all, I am looking into how Spark handles data locality wrt Tachyon. My main concern is how this is coordinated. Will it send a task based on a file loaded from Tachyon to a node that it knows has that file locally and how does it know which nodes has what? Kind regards, Shane This email

Saving RDDs in Tachyon

2015-10-22 Thread mark
I have Avro records stored in Parquet files in HDFS. I want to read these out as an RDD and save that RDD in Tachyon for any spark job that wants the data. How do I save the RDD in Tachyon? What format do I use? Which RDD 'saveAs...' method do I want? Thanks

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-26 Thread N B
Hi Dibyendu, I am not sure I understand completely. But are you suggesting that currently there is no way to enable Checkpoint directory to be in Tachyon? Thanks Nikunj On Fri, Sep 25, 2015 at 11:49 PM, Dibyendu Bhattacharya < dibyendu.bhattach...@gmail.com> wrote: > Hi, >

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-26 Thread Dibyendu Bhattacharya
Hi, Recently I was working on a PR to use Tachyon as OFF_HEAP store for Spark Streaming and make sure Spark Streaming can recover from Driver failure and recover the blocks form Tachyon. The The Motivation for this PR is : If Streaming application stores the blocks OFF_HEAP, it may not need

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-26 Thread N B
er > without using any WAL like feature because Blocks are already available in > Tachyon. The Meta Data Checkpoint helps to recover the meta data about past > received blocks. > > Now the question is , can I configure Tachyon as my Metadata Checkpoint > location ? I tried that ,

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-26 Thread N B
pointing and WAL. You do not need to >> enable Data Check pointing. >> >> From my experiments and the PR I mentioned , I configured the Meta Data >> Check Pointing in HDFS , and stored the Received Blocks OFF_HEAP. And I >> did not use any WAL . The PR I proposed woul

BlockNotFoundException when running spark word count on Tachyon

2015-08-26 Thread Todd
I am using tachyon in the spark program below,but I encounter a BlockNotFoundxception. Does someone know what's wrong and also is there guide on how to configure spark to work with Tackyon?Thanks! conf.set(spark.externalBlockStore.url, tachyon://10.18.19.33:19998) conf.set

Re: BlockNotFoundException when running spark word count on Tachyon

2015-08-26 Thread Dibyendu Bhattacharya
Sometime back I was playing with Spark and Tachyon and I also found this issue . The issue here is TachyonBlockManager put the blocks in WriteType.TRY_CACHE configuration . And because of this Blocks ate evicted from Tachyon Cache when Memory is full and when Spark try to find the block it throws

Re: BlockNotFoundException when running spark word count on Tachyon

2015-08-26 Thread Dibyendu Bhattacharya
The URL seems to have changed .. here is the one .. http://tachyon-project.org/documentation/Tiered-Storage-on-Tachyon.html On Wed, Aug 26, 2015 at 12:32 PM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Sometime back I was playing with Spark and Tachyon and I also found

Re: Spark is in-memory processing, how then can Tachyon make Spark faster?

2015-08-07 Thread andy petrella
Exactly! The sharing part is used in the Spark Notebook (this one https://github.com/andypetrella/spark-notebook/blob/master/notebooks/Tachyon%20Test.snb) so we can share stuffs between notebooks which are different SparkContext (in diff JVM). OTOH, we have a project that creates micro services

Re: tachyon

2015-08-07 Thread Abhishek R. Singh
Thanks Calvin - much appreciated ! -Abhishek- On Aug 7, 2015, at 11:11 AM, Calvin Jia jia.cal...@gmail.com wrote: Hi Abhishek, Here's a production use case that may interest you: http://www.meetup.com/Tachyon/events/222485713/ Baidu is using Tachyon to manage more than 100 nodes

Spark is in-memory processing, how then can Tachyon make Spark faster?

2015-08-07 Thread Muler
Spark is an in-memory engine and attempts to do computation in-memory. Tachyon is memory-centeric distributed storage, OK, but how would that help ran Spark faster?

Re: tachyon

2015-08-07 Thread Ted Yu
Looks like you would get better response on Tachyon's mailing list: https://groups.google.com/forum/?fromgroups#!forum/tachyon-users Cheers On Fri, Aug 7, 2015 at 9:56 AM, Abhishek R. Singh abhis...@tetrationanalytics.com wrote: Do people use Tachyon in production, or is it experimental

tachyon

2015-08-07 Thread Abhishek R. Singh
Do people use Tachyon in production, or is it experimental grade still? Regards, Abhishek - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark is in-memory processing, how then can Tachyon make Spark faster?

2015-08-07 Thread Calvin Jia
Hi, Tachyon http://tachyon-project.org manages memory off heap which can help prevent long GC pauses. Also, using Tachyon will allow the data to be shared between Spark jobs if they use the same dataset. Here's http://www.meetup.com/Tachyon/events/222485713/ a production use case where Baidu

Re: How can I use Tachyon with SPARK?

2015-06-15 Thread Himanshu Mehra
Hi June, As i understand your problem, you are running spark 1.3 and want to use Tachyon with it. what you need to do is simply build the latest Spark and Tachyon and set some configuration is Spark. In fact spark 1.3 has spark/core/pom.xm, you have to find the core folder in your spark home

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-05-21 Thread Tathagata Das
of Spark Streaming with Tachyon as OFF_HEAP block store. As I said in earlier email, I could able to solve the BlockNotFound exception when I used Hierarchical Storage of Tachyon , which is good. I continue doing some testing around storing the Spark Streaming WAL and CheckPoint files also

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-05-21 Thread Dibyendu Bhattacharya
Hi Tathagata, Thanks for looking into this. Further investigating I found that the issue is with Tachyon does not support File Append. The streaming receiver which writes to WAL when failed, and again restarted, not able to append to same WAL file after restart. I raised this with Tachyon user

Fast big data analytics with Spark on Tachyon in Baidu

2015-05-12 Thread Haoyuan Li
Dear all, We’re organizing a meetup http://www.meetup.com/Tachyon/events/222485713/ on May 28th at IBM in Forster City that might be of interest to the Spark community. The focus is a production use case of Spark and Tachyon at Baidu. You can sign up here: http://www.meetup.com/Tachyon/events

Re: Spark SQL 1.3.1 saveAsParquetFile will output tachyon file with different block size

2015-04-28 Thread Calvin Jia
Hi, You can apply this patch https://github.com/apache/spark/pull/5354 and recompile. Hope this helps, Calvin On Tue, Apr 28, 2015 at 1:19 PM, sara mustafa eng.sara.must...@gmail.com wrote: Hi Zhang, How did you compile Spark 1.3.1 with Tachyon? when i changed Tachyon version to 0.6.3

tachyon on machines launched with spark-ec2 scripts

2015-04-24 Thread Daniel Mahler
I have a cluster launched with spark-ec2. I can see a TachyonMaster process running, but I do not seem to be able to use tachyon from the spark-shell. if I try rdd.saveAsTextFile(tachyon://localhost:19998/path) I get 15/04/24 19:18:31 INFO TaskSetManager: Starting task 12.2 in stage 1.0 (TID

Re: Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

2015-04-17 Thread Reynold Xin
zhangxiongfei0...@163.com wrote: Hi, I did some tests on Parquet Files with Spark SQL DataFrame API. I generated 36 gzip compressed parquet files by Spark SQL and stored them on Tachyon,The size of each file is about 222M.Then read them with below code. val tfs =sqlContext.parquetFile

Re: Spark SQL 1.3.1 saveAsParquetFile will output tachyon file with different block size

2015-04-14 Thread Cheng Lian
Would you mind to open a JIRA for this? I think your suspicion makes sense. Will have a look at this tomorrow. Thanks for reporting! Cheng On 4/13/15 7:13 PM, zhangxiongfei wrote: Hi experts I run below code in Spark Shell to access parquet files in Tachyon. 1.First,created a DataFrame

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-04-01 Thread Haoyuan Li
Response inline. On Tue, Mar 31, 2015 at 10:41 PM, Sean Bigdatafun sean.bigdata...@gmail.com wrote: (resending...) I was thinking the same setup… But the more I think of this problem, and the more interesting this could be. If we allocate 50% total memory to Tachyon statically

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Sean Bigdatafun
(resending...) I was thinking the same setup… But the more I think of this problem, and the more interesting this could be. If we allocate 50% total memory to Tachyon statically, then the Mesos benefits of dynamically scheduling resources go away altogether. Can Tachyon be resource managed

deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I am fairly new to the spark ecosystem and I have been trying to setup a spark on mesos deployment. I can't seem to figure out the best practices around HDFS and Tachyon. The documentation about Spark's data-locality section seems to point

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Haoyuan Li
Tachyon should be co-located with Spark in this case. Best, Haoyuan On Tue, Mar 31, 2015 at 4:30 PM, Ankur Chauhan achau...@brightcove.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I am fairly new to the spark ecosystem and I have been trying to setup a spark on mesos

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Haoyuan, So on each mesos slave node I should allocate/section off some amount of memory for tachyon (let's say 50% of the total memory) and the rest for regular mesos tasks? This means, on each slave node I would have tachyon worker (+ hdfs

Re: rdd.toDF().saveAsParquetFile(tachyon://host:19998/test)

2015-03-28 Thread Yin Huai
You are hitting https://issues.apache.org/jira/browse/SPARK-6330. It has been fixed in 1.3.1, which will be released soon. On Fri, Mar 27, 2015 at 10:42 PM, sud_self 852677...@qq.com wrote: spark version is 1.3.0 with tanhyon-0.6.1 QUESTION DESCRIPTION: rdd.saveAsObjectFile(tachyon://host

rdd.toDF().saveAsParquetFile(tachyon://host:19998/test)

2015-03-27 Thread sud_self
spark version is 1.3.0 with tanhyon-0.6.1 QUESTION DESCRIPTION: rdd.saveAsObjectFile(tachyon://host:19998/test) and rdd.saveAsTextFile(tachyon://host:19998/test) succeed, but rdd.toDF().saveAsParquetFile(tachyon://host:19998/test) failure. ERROR MESSAGE

Re: RE: Building spark over specified tachyon

2015-03-15 Thread fightf...@163.com
Thanks, Jerry I got that way. Just to make sure whether there can be some option to directly specifying tachyon version. fightf...@163.com From: Shao, Saisai Date: 2015-03-16 11:10 To: fightf...@163.com CC: user Subject: RE: Building spark over specified tachyon I think you could change

RE: Building spark over specified tachyon

2015-03-15 Thread Shao, Saisai
I think you could change the pom file under Spark project to update the Tachyon related dependency version and rebuild it again (in case API is compatible, and behavior is the same). I'm not sure is there any command you can use to compile against Tachyon version. Thanks Jerry From: fightf

Building spark over specified tachyon

2015-03-15 Thread fightf...@163.com
Hi, all Noting that the current spark releases are built-in with tachyon 0.5.0 , if we want to recompile spark with maven and targeting on specific tachyon version (let's say the most recent 0.6.0 release), how should that be done? What maven compile command should be like ? Thanks, Sun

Re: Re: Building spark over specified tachyon

2015-03-15 Thread fightf...@163.com
Thanks haoyuan. fightf...@163.com From: Haoyuan Li Date: 2015-03-16 12:59 To: fightf...@163.com CC: Shao, Saisai; user Subject: Re: RE: Building spark over specified tachyon Here is a patch: https://github.com/apache/spark/pull/4867 On Sun, Mar 15, 2015 at 8:46 PM, fightf...@163.com fightf

Re: A way to share RDD directly using Tachyon?

2015-03-09 Thread Akhil Das
Did you try something like: myRDD.saveAsObjectFile(tachyon://localhost:19998/Y) val newRDD = sc.objectFile[MyObject](tachyon://localhost:19998/Y) Thanks Best Regards On Sun, Mar 8, 2015 at 3:59 PM, Yijie Shen henry.yijies...@gmail.com wrote: Hi, I would like to share a RDD in several Spark

A way to share RDD directly using Tachyon?

2015-03-08 Thread Yijie Shen
Hi, I would like to share a RDD in several Spark Applications,  i.e, create one in application A, publish the ID somewhere and get the RDD back directly using ID in Application B. I know I can use Tachyon just as a filesystem and  s.saveAsTextFile(tachyon://localhost:19998/Y”) like

Store the shuffled files in memory using Tachyon

2015-03-06 Thread sara mustafa
Hi all, Is it possible to store Spark shuffled files on a distributed memory like Tachyon instead of spilling them to disk? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Store-the-shuffled-files-in-memory-using-Tachyon-tp21944.html Sent from the Apache

Re: Spark or Tachyon: capture data lineage

2015-01-02 Thread Sven Krasser
Agreed with Jerry. Aside from Tachyon, seeing this for general debugging would be very helpful. Haoyuan, is that feature you are referring to related to https://issues.apache.org/jira/browse/SPARK-975? In the interim, I've found the toDebugString() method useful (but it renders execution

Re: Spark or Tachyon: capture data lineage

2015-01-02 Thread Haoyuan Li
Jerry, Great question. Spark and Tachyon capture lineage information at different granularities. We are working on an integration between Spark/Tachyon about this. Hope to get it ready to be released soon. Best, Haoyuan On Fri, Jan 2, 2015 at 12:24 PM, Jerry Lam chiling...@gmail.com wrote

Spark or Tachyon: capture data lineage

2015-01-02 Thread Jerry Lam
Hi spark developers, I was thinking it would be nice to extract the data lineage information from a data processing pipeline. I assume that spark/tachyon keeps this information somewhere. For instance, a data processing pipeline uses datasource A and B to produce C. C is then used by another

Re: UpdateStateByKey persist to Tachyon

2014-12-31 Thread amkcom
bumping this thread up -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/UpdateStateByKey-persist-to-Tachyon-tp20798p20930.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark on Tachyon

2014-12-20 Thread Peng Cheng
IMHO: cache doesn't provide redundancy, and its in the same jvm, so its much faster. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Tachyon-tp1463p20800.html Sent from the Apache Spark User List mailing list archive at Nabble.com

spark-ec2 starts hdfs1, tachyon but not spark

2014-12-17 Thread Al Thompson
myclus All seem to run OK. However, I got no web UI's for spark master or slave. Logging into the nodes, I see HDFS and Tachyon processes but none for Spark. The /root/tachyon folder has a full complement of files including conf, logs and so forth: $ ls /root/tachyon bin docs libexec logs

Re: Persist kafka streams to text file, tachyon error?

2014-11-22 Thread Haoyuan Li
StorageLevel.OFF_HEAP requires to run Tachyon: http://spark.apache.org/docs/latest/programming-guide.html If you don't know if you have tachyon or not, you probably don't :) http://tachyon-project.org/ For local testing, you can use other persist() solutions without running Tachyon. Best

Persist kafka streams to text file, tachyon error?

2014-11-21 Thread Joanne Contact
the following errors. It is related to Tachyon. But I don't know if I have tachyon or not. 14/11/21 14:17:54 WARN storage.TachyonBlockManager: Attempt 1 to create tachyon dir null failed java.io.IOException: Failed to connect to master localhost/127.0.0.1:19998 after 5 attempts

Storing shuffle files on a Tachyon

2014-10-07 Thread Soumya Simanta
Is it possible to store spark shuffle files on Tachyon ?

Re: spark-ec2 script with Tachyon

2014-09-26 Thread mrm
Hi, Did you manage to figure this out? I would appreciate if you could share the answer. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-ec2-script-with-Tachyon-tp9996p15249.html Sent from the Apache Spark User List mailing list archive at Nabble.com

[Tachyon] Error reading from Parquet files in HDFS

2014-08-21 Thread Evan Chan
Spark 1.0.2, Tachyon 0.4.1, Hadoop 1.0 (standard EC2 config) scala val gdeltT = sqlContext.parquetFile(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005/) 14/08/21 19:07:14 INFO : initialize(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005, Configuration: core-default.xml, core-site.xml

Re: [Tachyon] Error reading from Parquet files in HDFS

2014-08-21 Thread Evan Chan
The underFS is HDFS btw. On Thu, Aug 21, 2014 at 12:22 PM, Evan Chan velvia.git...@gmail.com wrote: Spark 1.0.2, Tachyon 0.4.1, Hadoop 1.0 (standard EC2 config) scala val gdeltT = sqlContext.parquetFile(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005/) 14/08/21 19:07:14 INFO

Re: [Tachyon] Error reading from Parquet files in HDFS

2014-08-21 Thread Evan Chan
And it worked earlier with non-parquet directory. On Thu, Aug 21, 2014 at 12:22 PM, Evan Chan velvia.git...@gmail.com wrote: The underFS is HDFS btw. On Thu, Aug 21, 2014 at 12:22 PM, Evan Chan velvia.git...@gmail.com wrote: Spark 1.0.2, Tachyon 0.4.1, Hadoop 1.0 (standard EC2 config

First Bay Area Tachyon meetup: August 25th, hosted by Yahoo! (Limited Space)

2014-08-19 Thread Haoyuan Li
Hi folks, We've posted the first Tachyon meetup, which will be on August 25th and is hosted by Yahoo! (Limited Space): http://www.meetup.com/Tachyon/events/200387252/ . Hope to see you there! Best, Haoyuan -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/

Re: First Bay Area Tachyon meetup: August 25th, hosted by Yahoo! (Limited Space)

2014-08-19 Thread Christopher Nguyen
Fantastic! Sent while mobile. Pls excuse typos etc. On Aug 19, 2014 4:09 PM, Haoyuan Li haoyuan...@gmail.com wrote: Hi folks, We've posted the first Tachyon meetup, which will be on August 25th and is hosted by Yahoo! (Limited Space): http://www.meetup.com/Tachyon/events/200387252/ . Hope

Re: share/reuse off-heap persisted (tachyon) RDD in SparkContext or saveAsParquetFile on tachyon in SQLContext

2014-08-12 Thread chutium
spark.speculation was not set, any speculative execution on tachyon side? tachyon-env.sh only changed following export TACHYON_MASTER_ADDRESS=test01.zala #export TACHYON_UNDERFS_ADDRESS=$TACHYON_HOME/underfs export TACHYON_UNDERFS_ADDRESS=hdfs://test01.zala:8020 export TACHYON_WORKER_MEMORY_SIZE

Re: share/reuse off-heap persisted (tachyon) RDD in SparkContext or saveAsParquetFile on tachyon in SQLContext

2014-08-12 Thread chutium
more interesting is if spark-shell started on master node (test01) then parquetFile.saveAsParquetFile(tachyon://test01.zala:19998/parquet_tablex) 14/08/12 11:42:06 INFO : initialize(tachyon://... ... ... 14/08/12 11:42:06 INFO : File does not exist: tachyon://test01.zala:19998/parquet_tablex

share/reuse off-heap persisted (tachyon) RDD in SparkContext or saveAsParquetFile on tachyon in SQLContext

2014-08-11 Thread chutium
sharing /reusing RDDs is always useful for many use cases, is this possible via persisting RDD on tachyon? such as off heap persist a named RDD into a given path (instead of /tmp_spark_tachyon/spark-xxx-xxx-xxx) or saveAsParquetFile on tachyon i tried to save a SchemaRDD on tachyon, val

Re: share/reuse off-heap persisted (tachyon) RDD in SparkContext or saveAsParquetFile on tachyon in SQLContext

2014-08-11 Thread Haoyuan Li
Is the speculative execution enabled? Best, Haoyuan On Mon, Aug 11, 2014 at 8:08 AM, chutium teng@gmail.com wrote: sharing /reusing RDDs is always useful for many use cases, is this possible via persisting RDD on tachyon? such as off heap persist a named RDD into a given path (instead

Re: Spark-sql with Tachyon cache

2014-08-02 Thread Michael Armbrust
We are investigating various ways to integrate with Tachyon. I'll note that you can already use saveAsParquetFile and parquetFile(...).registerAsTable(tableName) (soon to be registerTempTable in Spark 1.1) to store data into tachyon and query it with Spark SQL. On Fri, Aug 1, 2014 at 1:42 AM

spark-ec2 script with Tachyon

2014-07-16 Thread nit
Hi, It seems that spark-ec2 script deploys Tachyon module along with other setup. I am trying to use .persist(OFF_HEAP) for RDD persistence, but on worker I see this error -- Failed to connect (2) to master localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused -- From

Re: Cannot create dir in Tachyon when running Spark with OFF_HEAP caching (FileDoesNotExistException)

2014-07-08 Thread Teng Long
More updates: Seems in TachyonBlockManager.scala(line 118) of Spark 1.1.0, the TachyonFS.mkdir() method is called, which creates a directory in Tachyon. Right after that, TachyonFS.getFile() method is called. In all the versions of Tachyon I tried (0.4.1, 0.4.0), the second method will return

Cannot create dir in Tachyon when running Spark with OFF_HEAP caching (FileDoesNotExistException)

2014-07-07 Thread Teng Long
Hi guys, I'm running Spark 1.0.0 with Tachyon 0.4.1, both in single node mode. Tachyon's own tests (./bin/tachyon runTests) works good, and manual file system operation like mkdir works well. But when I tried to run a very simple Spark task with RDD persist as OFF_HEAP, I got the following