357...@qq.com>;"Gene
Pang"<gene.p...@gmail.com>;
Subject: Re: Are tachyon and akka removed from 2.1.1 please
Akka has been replaced by netty in 1.6
Le 22 mai 2017 15:25, "Chin Wei Low" <lowchin...@gmail.com> a ??crit :
I think akka has been removed since 2.0.
thanks Gene.
---Original---
From: "Gene Pang"<gene.p...@gmail.com>
Date: 2017/5/22 22:19:47
To: "??"<1427357...@qq.com>;
Cc: "user"<user@spark.apache.org>;
Subject: Re: Are tachyon and akka removed from 2.1.1 please
Hi,
Tachyon has be
Akka has been replaced by netty in 1.6
Le 22 mai 2017 15:25, "Chin Wei Low" <lowchin...@gmail.com> a écrit :
> I think akka has been removed since 2.0.
>
> On 22 May 2017 10:19 pm, "Gene Pang" <gene.p...@gmail.com> wrote:
>
>> H
I think akka has been removed since 2.0.
On 22 May 2017 10:19 pm, "Gene Pang" <gene.p...@gmail.com> wrote:
> Hi,
>
> Tachyon has been renamed to Alluxio. Here is the documentation for
> running Alluxio with Spark
> <http://www.alluxio.org/docs/master/en/Runnin
Hi,
Tachyon has been renamed to Alluxio. Here is the documentation for running
Alluxio with Spark
<http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html>.
Hope this helps,
Gene
On Sun, May 21, 2017 at 6:15 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote:
> HI all,
> Irea
HI all,
Iread some paper about source code, the paper base on version 1.2. they refer
the tachyon and akka. When i read the 2.1code. I can not find the code abiut
akka and tachyon.
Are tachyon and akka removed from 2.1.1 please
Hi,
If you are looking for how to run Spark on Alluxio (formerly Tachyon),
here is the documentation from Alluxio doc site:
http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
It still works for Spark 2.x.
Alluxio team also published articles on when and why running Spark (2.x
...@gmail.com> wrote:
> Here is my understanding.
>
> Spark used Tachyon as an off-heap solution for RDDs. In certain situations,
> it would alleviate Garbage Collection or the RDDs.
>
> Tungsten, Spark 2’s off-heap (columnar format) is much more efficient and
> used as the de
Here is my understanding.
Spark used Tachyon as an off-heap solution for RDDs. In certain situations, it
would alleviate Garbage Collection or the RDDs.
Tungsten, Spark 2’s off-heap (columnar format) is much more efficient and used
as the default. Alluvio no longer makes sense for this use
Hi folks,
What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention it
no longer.
--
Oleksiy Dyagilev
Hi, Calvin, I am running 24GB data Spark KMeans in a c3.2xlarge AWS
instance with 30GB physical memory.
Spark will cache data off-heap to Tachyon, the input data is also stored in
Tachyon.
Tachyon is configured to use 15GB memory, and use tired store.
Tachyon underFS is /tmp.
The only
Hey, Jia Zou
I'm curious about this exception, the error log you showed that the
exception is related to unlockBlock, could you upload your full master.log
and worker.log under tachyon/logs directory?
Best,
Cheng
在 2016年1月29日星期五 UTC+8上午11:11:19,Calvin Jia写道:
>
> Hi,
>
Hi,
Thanks for the detailed information. How large is the dataset you are
running against? Also did you change any Tachyon configurations?
Thanks,
Calvin
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
Dears, I keep getting below exception when using Spark 1.6.0 on top of
Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH.
Any suggestions will be appreciated, thanks!
=
Exception in thread "main" org.apache.spark.SparkException: J
BTW. The tachyon worker log says following:
2015-12-27 01:33:44,599 ERROR WORKER_LOGGER
(WorkerBlockMasterClient.java:getId) - java.net.SocketException: Connection
reset
org.apache.thrift.transport.TTransportException: java.net.SocketException:
Connection reset
)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
On Wed, Jan 27, 2016 at 5:53 AM, Jia Zou <jacqueline...@gmail.com> wrote:
> BTW. The tachyon worker log says
BTW. the error happens when configure Spark to read input file from Tachyon
like following:
/home/ubuntu/spark-1.6.0/bin/spark-submit --properties-file
/home/ubuntu/HiBench/report/kmeans/spark/java/conf/sparkbench/spark.conf
--class org.apache.spark.examples.mllib.JavaKMeans --master spark://ip
Hi,
You should be able to point Hive to Tachyon instead of HDFS, and that
should allow Hive to access data in Tachyon. If Spark SQL was pointing to
an HDFS file, you could instead point it to a Tachyon file, and that should
work too.
Hope that helps,
Gene
On Wed, Jan 20, 2016 at 2:06 AM, Sea
Hi,all I want to mount some hive table in tachyon, but I don't know how to
query data in tachyon with spark-sql, who knows?
Hi Mark,
Were you able to successfully store the RDD with Akhil's method? When you
read it back as an objectFile, you will also need to specify the correct
type.
You can find more information about integrating Spark and Tachyon on this
page: http://tachyon-project.org/documentation/Running-Spark
Hi,all
Well, I have some questions about Tachyon and Spark. I found the
interactive between Spark and Tachyon is the caching RDD use off-heap. I
wonder if you guys use Tachyon frequently, such caching RDD by Tachyon? Is
this action(caching rdd by tachyon) has a profound effect to accelerate
Spark
I guess you can do a .saveAsObjectFiles and read it back as sc.objectFile
Thanks
Best Regards
On Fri, Oct 23, 2015 at 7:57 AM, mark <manwoodv...@googlemail.com> wrote:
> I have Avro records stored in Parquet files in HDFS. I want to read these
> out as an RDD and save that RD
Hi Shane,
Tachyon provides an api to get the block locations of the file which Spark
uses when scheduling tasks.
Hope this helps,
Calvin
On Fri, Oct 23, 2015 at 8:15 AM, Kinsella, Shane <shane.kinse...@aspect.com>
wrote:
> Hi all,
>
>
>
> I am looking into how Spark hand
Hi all,
I am looking into how Spark handles data locality wrt Tachyon. My main concern
is how this is coordinated. Will it send a task based on a file loaded from
Tachyon to a node that it knows has that file locally and how does it know
which nodes has what?
Kind regards,
Shane
This email
I have Avro records stored in Parquet files in HDFS. I want to read these
out as an RDD and save that RDD in Tachyon for any spark job that wants the
data.
How do I save the RDD in Tachyon? What format do I use? Which RDD
'saveAs...' method do I want?
Thanks
Hi Dibyendu,
I am not sure I understand completely. But are you suggesting that
currently there is no way to enable Checkpoint directory to be in Tachyon?
Thanks
Nikunj
On Fri, Sep 25, 2015 at 11:49 PM, Dibyendu Bhattacharya <
dibyendu.bhattach...@gmail.com> wrote:
> Hi,
>
Hi,
Recently I was working on a PR to use Tachyon as OFF_HEAP store for Spark
Streaming and make sure Spark Streaming can recover from Driver failure and
recover the blocks form Tachyon.
The The Motivation for this PR is :
If Streaming application stores the blocks OFF_HEAP, it may not need
er
> without using any WAL like feature because Blocks are already available in
> Tachyon. The Meta Data Checkpoint helps to recover the meta data about past
> received blocks.
>
> Now the question is , can I configure Tachyon as my Metadata Checkpoint
> location ? I tried that ,
pointing and WAL. You do not need to
>> enable Data Check pointing.
>>
>> From my experiments and the PR I mentioned , I configured the Meta Data
>> Check Pointing in HDFS , and stored the Received Blocks OFF_HEAP. And I
>> did not use any WAL . The PR I proposed woul
I am using tachyon in the spark program below,but I encounter a
BlockNotFoundxception.
Does someone know what's wrong and also is there guide on how to configure
spark to work with Tackyon?Thanks!
conf.set(spark.externalBlockStore.url, tachyon://10.18.19.33:19998)
conf.set
Sometime back I was playing with Spark and Tachyon and I also found this
issue . The issue here is TachyonBlockManager put the blocks in
WriteType.TRY_CACHE configuration . And because of this Blocks ate evicted
from Tachyon Cache when Memory is full and when Spark try to find the block
it throws
The URL seems to have changed .. here is the one ..
http://tachyon-project.org/documentation/Tiered-Storage-on-Tachyon.html
On Wed, Aug 26, 2015 at 12:32 PM, Dibyendu Bhattacharya
dibyendu.bhattach...@gmail.com wrote:
Sometime back I was playing with Spark and Tachyon and I also found
Exactly!
The sharing part is used in the Spark Notebook (this one
https://github.com/andypetrella/spark-notebook/blob/master/notebooks/Tachyon%20Test.snb)
so we can share stuffs between notebooks which are different SparkContext
(in diff JVM).
OTOH, we have a project that creates micro services
Thanks Calvin - much appreciated !
-Abhishek-
On Aug 7, 2015, at 11:11 AM, Calvin Jia jia.cal...@gmail.com wrote:
Hi Abhishek,
Here's a production use case that may interest you:
http://www.meetup.com/Tachyon/events/222485713/
Baidu is using Tachyon to manage more than 100 nodes
Spark is an in-memory engine and attempts to do computation in-memory.
Tachyon is memory-centeric distributed storage, OK, but how would that help
ran Spark faster?
Looks like you would get better response on Tachyon's mailing list:
https://groups.google.com/forum/?fromgroups#!forum/tachyon-users
Cheers
On Fri, Aug 7, 2015 at 9:56 AM, Abhishek R. Singh
abhis...@tetrationanalytics.com wrote:
Do people use Tachyon in production, or is it experimental
Do people use Tachyon in production, or is it experimental grade still?
Regards,
Abhishek
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi,
Tachyon http://tachyon-project.org manages memory off heap which can help
prevent long GC pauses. Also, using Tachyon will allow the data to be
shared between Spark jobs if they use the same dataset.
Here's http://www.meetup.com/Tachyon/events/222485713/ a production use
case where Baidu
Hi June,
As i understand your problem, you are running spark 1.3 and want to use
Tachyon with it. what you need to do is simply build the latest Spark and
Tachyon and set some configuration is Spark. In fact spark 1.3 has
spark/core/pom.xm, you have to find the core folder in your spark home
of Spark Streaming with Tachyon as
OFF_HEAP block store. As I said in earlier email, I could able to solve the
BlockNotFound exception when I used Hierarchical Storage of Tachyon ,
which is good.
I continue doing some testing around storing the Spark Streaming WAL and
CheckPoint files also
Hi Tathagata,
Thanks for looking into this. Further investigating I found that the issue
is with Tachyon does not support File Append. The streaming receiver which
writes to WAL when failed, and again restarted, not able to append to same
WAL file after restart.
I raised this with Tachyon user
Dear all,
We’re organizing a meetup http://www.meetup.com/Tachyon/events/222485713/ on
May 28th at IBM in Forster City that might be of interest to the Spark
community. The focus is a production use case of Spark and Tachyon at Baidu.
You can sign up here: http://www.meetup.com/Tachyon/events
Hi,
You can apply this patch https://github.com/apache/spark/pull/5354 and
recompile.
Hope this helps,
Calvin
On Tue, Apr 28, 2015 at 1:19 PM, sara mustafa eng.sara.must...@gmail.com
wrote:
Hi Zhang,
How did you compile Spark 1.3.1 with Tachyon? when i changed Tachyon
version
to 0.6.3
I have a cluster launched with spark-ec2.
I can see a TachyonMaster process running,
but I do not seem to be able to use tachyon from the spark-shell.
if I try
rdd.saveAsTextFile(tachyon://localhost:19998/path)
I get
15/04/24 19:18:31 INFO TaskSetManager: Starting task 12.2 in stage 1.0 (TID
zhangxiongfei0...@163.com
wrote:
Hi,
I did some tests on Parquet Files with Spark SQL DataFrame API.
I generated 36 gzip compressed parquet files by Spark SQL and stored them
on Tachyon,The size of each file is about 222M.Then read them with below
code.
val tfs
=sqlContext.parquetFile
Would you mind to open a JIRA for this?
I think your suspicion makes sense. Will have a look at this tomorrow.
Thanks for reporting!
Cheng
On 4/13/15 7:13 PM, zhangxiongfei wrote:
Hi experts
I run below code in Spark Shell to access parquet files in Tachyon.
1.First,created a DataFrame
Response inline.
On Tue, Mar 31, 2015 at 10:41 PM, Sean Bigdatafun sean.bigdata...@gmail.com
wrote:
(resending...)
I was thinking the same setup… But the more I think of this problem, and
the more interesting this could be.
If we allocate 50% total memory to Tachyon statically
(resending...)
I was thinking the same setup… But the more I think of this problem, and
the more interesting this could be.
If we allocate 50% total memory to Tachyon statically, then the Mesos
benefits of dynamically scheduling resources go away altogether.
Can Tachyon be resource managed
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
I am fairly new to the spark ecosystem and I have been trying to setup
a spark on mesos deployment. I can't seem to figure out the best
practices around HDFS and Tachyon. The documentation about Spark's
data-locality section seems to point
Tachyon should be co-located with Spark in this case.
Best,
Haoyuan
On Tue, Mar 31, 2015 at 4:30 PM, Ankur Chauhan achau...@brightcove.com
wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
I am fairly new to the spark ecosystem and I have been trying to setup
a spark on mesos
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi Haoyuan,
So on each mesos slave node I should allocate/section off some amount
of memory for tachyon (let's say 50% of the total memory) and the rest
for regular mesos tasks?
This means, on each slave node I would have tachyon worker (+ hdfs
You are hitting https://issues.apache.org/jira/browse/SPARK-6330. It has
been fixed in 1.3.1, which will be released soon.
On Fri, Mar 27, 2015 at 10:42 PM, sud_self 852677...@qq.com wrote:
spark version is 1.3.0 with tanhyon-0.6.1
QUESTION DESCRIPTION: rdd.saveAsObjectFile(tachyon://host
spark version is 1.3.0 with tanhyon-0.6.1
QUESTION DESCRIPTION: rdd.saveAsObjectFile(tachyon://host:19998/test) and
rdd.saveAsTextFile(tachyon://host:19998/test) succeed, but
rdd.toDF().saveAsParquetFile(tachyon://host:19998/test) failure.
ERROR MESSAGE
Thanks, Jerry
I got that way. Just to make sure whether there can be some option to directly
specifying tachyon version.
fightf...@163.com
From: Shao, Saisai
Date: 2015-03-16 11:10
To: fightf...@163.com
CC: user
Subject: RE: Building spark over specified tachyon
I think you could change
I think you could change the pom file under Spark project to update the Tachyon
related dependency version and rebuild it again (in case API is compatible, and
behavior is the same).
I'm not sure is there any command you can use to compile against Tachyon
version.
Thanks
Jerry
From: fightf
Hi, all
Noting that the current spark releases are built-in with tachyon 0.5.0 ,
if we want to recompile spark with maven and targeting on specific tachyon
version (let's say the most recent 0.6.0 release),
how should that be done? What maven compile command should be like ?
Thanks,
Sun
Thanks haoyuan.
fightf...@163.com
From: Haoyuan Li
Date: 2015-03-16 12:59
To: fightf...@163.com
CC: Shao, Saisai; user
Subject: Re: RE: Building spark over specified tachyon
Here is a patch: https://github.com/apache/spark/pull/4867
On Sun, Mar 15, 2015 at 8:46 PM, fightf...@163.com fightf
Did you try something like:
myRDD.saveAsObjectFile(tachyon://localhost:19998/Y)
val newRDD = sc.objectFile[MyObject](tachyon://localhost:19998/Y)
Thanks
Best Regards
On Sun, Mar 8, 2015 at 3:59 PM, Yijie Shen henry.yijies...@gmail.com
wrote:
Hi,
I would like to share a RDD in several Spark
Hi,
I would like to share a RDD in several Spark Applications,
i.e, create one in application A, publish the ID somewhere and get the RDD back
directly using ID in Application B.
I know I can use Tachyon just as a filesystem and
s.saveAsTextFile(tachyon://localhost:19998/Y”) like
Hi all,
Is it possible to store Spark shuffled files on a distributed memory like
Tachyon instead of spilling them to disk?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Store-the-shuffled-files-in-memory-using-Tachyon-tp21944.html
Sent from the Apache
Agreed with Jerry. Aside from Tachyon, seeing this for general debugging
would be very helpful.
Haoyuan, is that feature you are referring to related to
https://issues.apache.org/jira/browse/SPARK-975?
In the interim, I've found the toDebugString() method useful (but it
renders execution
Jerry,
Great question. Spark and Tachyon capture lineage information at different
granularities. We are working on an integration between Spark/Tachyon about
this. Hope to get it ready to be released soon.
Best,
Haoyuan
On Fri, Jan 2, 2015 at 12:24 PM, Jerry Lam chiling...@gmail.com wrote
Hi spark developers,
I was thinking it would be nice to extract the data lineage information
from a data processing pipeline. I assume that spark/tachyon keeps this
information somewhere. For instance, a data processing pipeline uses
datasource A and B to produce C. C is then used by another
bumping this thread up
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/UpdateStateByKey-persist-to-Tachyon-tp20798p20930.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
IMHO: cache doesn't provide redundancy, and its in the same jvm, so its much
faster.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Tachyon-tp1463p20800.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
myclus
All seem to run OK. However, I got no web UI's for spark master or slave.
Logging into the nodes, I see HDFS and Tachyon processes but none for Spark.
The /root/tachyon folder has a full complement of files including conf,
logs and so forth:
$ ls /root/tachyon
bin docs libexec logs
StorageLevel.OFF_HEAP requires to run Tachyon:
http://spark.apache.org/docs/latest/programming-guide.html
If you don't know if you have tachyon or not, you probably don't :)
http://tachyon-project.org/
For local testing, you can use other persist() solutions without running
Tachyon.
Best
the following errors. It is related to Tachyon. But I don't know if
I have tachyon or not.
14/11/21 14:17:54 WARN storage.TachyonBlockManager: Attempt 1 to create
tachyon dir null failed
java.io.IOException: Failed to connect to master localhost/127.0.0.1:19998
after 5 attempts
Is it possible to store spark shuffle files on Tachyon ?
Hi,
Did you manage to figure this out? I would appreciate if you could share the
answer.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-ec2-script-with-Tachyon-tp9996p15249.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Spark 1.0.2, Tachyon 0.4.1, Hadoop 1.0 (standard EC2 config)
scala val gdeltT =
sqlContext.parquetFile(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005/)
14/08/21 19:07:14 INFO :
initialize(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005,
Configuration: core-default.xml, core-site.xml
The underFS is HDFS btw.
On Thu, Aug 21, 2014 at 12:22 PM, Evan Chan velvia.git...@gmail.com wrote:
Spark 1.0.2, Tachyon 0.4.1, Hadoop 1.0 (standard EC2 config)
scala val gdeltT =
sqlContext.parquetFile(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005/)
14/08/21 19:07:14 INFO
And it worked earlier with non-parquet directory.
On Thu, Aug 21, 2014 at 12:22 PM, Evan Chan velvia.git...@gmail.com wrote:
The underFS is HDFS btw.
On Thu, Aug 21, 2014 at 12:22 PM, Evan Chan velvia.git...@gmail.com wrote:
Spark 1.0.2, Tachyon 0.4.1, Hadoop 1.0 (standard EC2 config
Hi folks,
We've posted the first Tachyon meetup, which will be on August 25th and is
hosted by Yahoo! (Limited Space):
http://www.meetup.com/Tachyon/events/200387252/ . Hope to see you there!
Best,
Haoyuan
--
Haoyuan Li
AMPLab, EECS, UC Berkeley
http://www.cs.berkeley.edu/~haoyuan/
Fantastic!
Sent while mobile. Pls excuse typos etc.
On Aug 19, 2014 4:09 PM, Haoyuan Li haoyuan...@gmail.com wrote:
Hi folks,
We've posted the first Tachyon meetup, which will be on August 25th and is
hosted by Yahoo! (Limited Space):
http://www.meetup.com/Tachyon/events/200387252/ . Hope
spark.speculation was not set, any speculative execution on tachyon side?
tachyon-env.sh only changed following
export TACHYON_MASTER_ADDRESS=test01.zala
#export TACHYON_UNDERFS_ADDRESS=$TACHYON_HOME/underfs
export TACHYON_UNDERFS_ADDRESS=hdfs://test01.zala:8020
export TACHYON_WORKER_MEMORY_SIZE
more interesting is if spark-shell started on master node (test01)
then
parquetFile.saveAsParquetFile(tachyon://test01.zala:19998/parquet_tablex)
14/08/12 11:42:06 INFO : initialize(tachyon://...
...
...
14/08/12 11:42:06 INFO : File does not exist:
tachyon://test01.zala:19998/parquet_tablex
sharing /reusing RDDs is always useful for many use cases, is this possible
via persisting RDD on tachyon?
such as off heap persist a named RDD into a given path (instead of
/tmp_spark_tachyon/spark-xxx-xxx-xxx)
or
saveAsParquetFile on tachyon
i tried to save a SchemaRDD on tachyon,
val
Is the speculative execution enabled?
Best,
Haoyuan
On Mon, Aug 11, 2014 at 8:08 AM, chutium teng@gmail.com wrote:
sharing /reusing RDDs is always useful for many use cases, is this possible
via persisting RDD on tachyon?
such as off heap persist a named RDD into a given path (instead
We are investigating various ways to integrate with Tachyon. I'll note
that you can already use saveAsParquetFile and
parquetFile(...).registerAsTable(tableName) (soon to be registerTempTable
in Spark 1.1) to store data into tachyon and query it with Spark SQL.
On Fri, Aug 1, 2014 at 1:42 AM
Hi,
It seems that spark-ec2 script deploys Tachyon module along with other
setup.
I am trying to use .persist(OFF_HEAP) for RDD persistence, but on worker I
see this error
--
Failed to connect (2) to master localhost/127.0.0.1:19998 :
java.net.ConnectException: Connection refused
--
From
More updates:
Seems in TachyonBlockManager.scala(line 118) of Spark 1.1.0, the
TachyonFS.mkdir() method is called, which creates a directory in Tachyon.
Right after that, TachyonFS.getFile() method is called. In all the versions
of Tachyon I tried (0.4.1, 0.4.0), the second method will return
Hi guys,
I'm running Spark 1.0.0 with Tachyon 0.4.1, both in single node mode.
Tachyon's own tests (./bin/tachyon runTests) works good, and manual file
system operation like mkdir works well. But when I tried to run a very
simple Spark task with RDD persist as OFF_HEAP, I got the following
83 matches
Mail list logo