You don’t need to. It is not static allocated to RDD cache, it is just an up
limit.
If you don’t use up the memory by RDD cache, it is always available for other
usage. except those one also controlled by some memoryFraction conf. e.g.
spark.shuffle.memoryFraction which you also set the up
Regards,
Raymond Liu
From: 牛兆捷 [mailto:nzjem...@gmail.com]
Sent: Thursday, September 04, 2014 2:57 PM
To: Liu, Raymond
Cc: Patrick Wendell; u...@spark.apache.org; dev@spark.apache.org
Subject: Re: memory size for caching RDD
Oh I see.
I want to implement something like this: sometimes I need
Actually, a replicated RDD and a parallel job on the same RDD, this two
conception is not related at all.
A replicated RDD just store data on multiple node, it helps with HA and provide
better chance for data locality. It is still one RDD, not two separate RDD.
While regarding run two jobs on
You don’t need to. It is not static allocated to RDD cache, it is just an up
limit.
If you don’t use up the memory by RDD cache, it is always available for other
usage. except those one also controlled by some memoryFraction conf. e.g.
spark.shuffle.memoryFraction which you also set the up
Regards,
Raymond Liu
From: 牛兆捷 [mailto:nzjem...@gmail.com]
Sent: Thursday, September 04, 2014 2:57 PM
To: Liu, Raymond
Cc: Patrick Wendell; user@spark.apache.org; d...@spark.apache.org
Subject: Re: memory size for caching RDD
Oh I see.
I want to implement something like this: sometimes I need
Not sure what did you refer to when saying replicated rdd, if you actually mean
RDD, then, yes , read the API doc and paper as Tobias mentioned.
If you actually focus on the word replicated, then that is for fault
tolerant, and probably mostly used in the streaming case for receiver created
AFAIK, No.
Best Regards,
Raymond Liu
From: 牛兆捷 [mailto:nzjem...@gmail.com]
Sent: Thursday, September 04, 2014 11:30 AM
To: user@spark.apache.org
Subject: resize memory size for caching RDD
Dear all:
Spark uses memory to cache RDD and the memory size is specified by
You could use cogroup to combine RDDs in one RDD for cross reference processing.
e.g.
a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty r.nonEmpty }. map{case
(k,(l,r)) = (k, l)}
Best Regards,
Raymond Liu
-Original Message-
From: marylucy [mailto:qaz163wsx_...@hotmail.com]
Sent:
,
Raymond Liu
From: Victor Tso-Guillen [mailto:v...@paxata.com]
Sent: Wednesday, August 27, 2014 1:40 PM
To: Liu, Raymond
Cc: user@spark.apache.org
Subject: Re: What is a Block Manager?
We're a single-app deployment so we want to launch as many executors as the
system has workers. We accomplish
Basically, a Block Manager manages the storage for most of the data in spark,
name a few: block that represent a cached RDD partition, intermediate shuffle
data, broadcast data etc. it is per executor, while in standalone mode,
normally, you have one executor per worker.
You don't control how
You can try to manipulate the string you want to output before saveAsTextFile,
something like
modify. flatMap(x=x).map{x=
val s=x.toString
s.subSequence(1,s.length-1)
}
Should have more optimized way.
Best Regards,
Raymond Liu
-Original Message-
From: yh18190
I could not found the examples/flink-java-examples-*-KMeansIterative.jar in the
trunk code. Is that removed? If so, then the run_example_quickstart doc need to
be updated too.
Best Regards,
Raymond Liu
so how to run the check locally?
On master tree, sbt mimaReportBinaryIssues Seems to lead to a lot of errors
reported. Do we need to modify SparkBuilder.scala etc to run it locally? Could
not figure out how Jekins run the check on its console outputs.
Best Regards,
Raymond Liu
-Original
Hi
Just clone the code from incubator flink, I tried both trunk code and 0.5.1
release, and encounter the same problem.
# mvn --version
Apache Maven 3.0.4
Maven home: /usr/share/maven
Java version: 1.6.0_30, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_30/jre
Default locale:
Actually, tried jdk 1.7.0. It works. While the project readme said that Java 6,
7 or 8 both works...
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com]
Sent: Monday, June 30, 2014 4:13 PM
To: dev@flink.incubator.apache.org
Subject: Unable
,
Robert
On Mon, Jun 30, 2014 at 10:20 AM, Liu, Raymond raymond@intel.com
wrote:
Actually, tried jdk 1.7.0. It works. While the project readme said
that Java 6, 7 or 8 both works...
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com
I think there is a shuffle stage involved. And the future count job will
depends on the first job’s shuffle stages’s output data directly as long as it
is still available. Thus it will be much faster.
Best Regards,
Raymond Liu
From: tomsheep...@gmail.com [mailto:tomsheep...@gmail.com]
Sent:
If some task have no locality preference, it will also show up as
PROCESS_LOCAL, yet, I think we probably need to name it NO_PREFER to make it
more clear. Not sure is this your case.
Best Regards,
Raymond Liu
From: coded...@gmail.com [mailto:coded...@gmail.com] On Behalf Of Sung Hwan
Chung
Seems you are asking that does spark related jar need to be deploy to yarn
cluster manually before you launch application?
Then, no , you don't, just like other yarn application. And it doesn't matter
it is yarn-client or yarn-cluster mode..
Best Regards,
Raymond Liu
-Original
In the core, they are not quite different
In standalone mode, you have spark master and spark worker who allocate driver
and executors for your spark app.
While in Yarn mode, Yarn resource manager and node manager do this work.
When the driver and executors have been launched, the rest part of
. So it seems to me that when running the full path code in my
previous case, 32 core with 50MB/s total throughput are reasonable?
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com]
Later case, total throughput aggregated from all cores
per word occurrence.
On Tue, Apr 29, 2014 at 7:48 AM, Liu, Raymond raymond@intel.com wrote:
Hi Patrick
I am just doing simple word count , the data is generated by hadoop
random text writer.
This seems to me not quite related to compress , If I turn off compress
on shuffle
Hi
I am running a WordCount program which count words from HDFS, and I
noticed that the serializer part of code takes a lot of CPU time. On a
16core/32thread node, the total throughput is around 50MB/s by JavaSerializer,
and if I switching to KryoSerializer, it doubles to around
For all the tasks, say 32 task on total
Best Regards,
Raymond Liu
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Is this the serialization throughput per task or the serialization throughput
for all the tasks?
On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond
directly instead of
read from HDFS, similar throughput result)
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com]
For all the tasks, say 32 task on total
Best Regards,
Raymond Liu
-Original Message-
From: Patrick Wendell
, 2014 at 10:14 PM, Liu, Raymond raymond@intel.com wrote:
For all the tasks, say 32 task on total
Best Regards,
Raymond Liu
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Is this the serialization throughput per task or the serialization throughput
Hi
I am running a simple word count program on spark standalone cluster.
The cluster is made up of 6 node, each run 4 worker and each worker own 10G
memory and 16 core thus total 96 core and 240G memory. ( well, also used to
configed as 1 worker with 40G memory on each node )
is definitely a
bit strange, the data gets compressed when written to disk, but unless you have
a weird dataset (E.g. all zeros) I wouldn't expect it to compress _that_ much.
On Mon, Apr 28, 2014 at 1:18 AM, Liu, Raymond raymond@intel.com wrote:
Hi
I am running a simple word count program
Should be fixed in both alpha and stable code base, since we aim to support
both version
Best Regards,
Raymond Liu
-Original Message-
From: Nan Zhu [mailto:zhunanmcg...@gmail.com]
Sent: Wednesday, February 12, 2014 10:29 AM
To: dev@spark.incubator.apache.org
Subject: Does yarn-stable
Not sure what did you aim to solve. When you mention Spark Master, I guess you
probably mean spark standalone mode? In that case spark cluster does not
necessary coupled with hadoop cluster. While if you aim to achieve better data
locality , then yes, run spark worker on HDFS data node might
Hi
Regarding your question
1) when I run the above script, which jar is beed submitted to the yarn server
?
What SPARK_JAR env point to and the --jar point to are both submitted to the
yarn server
2) It like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the
role of
I think you could put the spark jar and other jar your app depends on while not
changes a lot on HDFS, and use --files or --addjars ( depends on the mode you
run YarnClient/YarnStandalone ) to refer to them.
And then just need to redeploy your thin app jar on each invoke.
Best Regards,
Raymond
Not found in which part of code? If in sparkContext thread, say on AM,
--addJars should work
If on tasks, then --addjars won't work, you need to use --file=local://xxx etc,
not sure is it available in 0.8.1. And adding to a single jar should also work,
if not works, might be something wrong
I think you also need to set yarn.version
Say something like
mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package
hadoop.version is default to 2.2.0 while yarn.version not when you chose the
new-yarn profile. We probably need to fix it later for easy usage.
Sorry , mvn -Pnew-yarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests
clean package
The one in previous mail not yet available.
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond
Sent: Friday, January 03, 2014 2:09 PM
To: dev@spark.incubator.apache.org
Subject
@spark.incubator.apache.org
Subject: Re: compiling against hadoop 2.2
Specification of yarn.version can be inserted following this line (#762 in
pom.xml), right ?
hadoop.version2.2.0/hadoop.version
On Thu, Jan 2, 2014 at 10:10 PM, Liu, Raymond raymond@intel.com wrote:
Sorry , mvn -Pnew-yarn
And I am not sure where it is value able to providing different setting for
hadoop/hdfs and yarn version. When build with SBT, they will always be the
same. Maybe in mvn we should do so too.
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond
Sent: Friday, January 03
Hi Izhar
Is that the exact command you are running? Say with 0.8.0 instead of
0.8.1 in the cmd?
Raymond Liu
From: Izhar ul Hassan [mailto:ezh...@gmail.com]
Sent: Friday, December 27, 2013 9:40 PM
To: user@spark.incubator.apache.org
Subject: Errors with spark-0.8.1 hadoop-yarn 2.2.0
Ido, when you say add external JARS, do you mean by -addJars which adding some
jar for SparkContext to use in the AM env?
If so, I think you don't need it for yarn-cilent mode at all, for yarn-client
mode, SparkContext running locally, I think you just need to make sure those
jars are in the
It's what it said on the document. For yarn-standalone mode, it will be the
host of where spark AM runs, while for yarn-client mode, it will be the local
host you run the cmd.
And what's cmd you run SparkPi ? I think you actually don't need to set
sprak.driver.host manually for Yarn mode ,
: AppMaster received a signal.
13/12/17 11:07:13 WARN yarn.ApplicationMaster: Failed to connect to driver at
null:null, retrying ...
After retry 'spark.yarn.applicationMaster.waitTries'(default 10), Job failed.
On Tue, Dec 17, 2013 at 12:07 PM, Liu, Raymond
raymond@intel.commailto:raymond
-distributed?
On Tue, Dec 17, 2013 at 1:03 PM, Liu, Raymond
raymond@intel.commailto:raymond@intel.com wrote:
Hmm, I don't see what mode you are trying to use? You specify the MASTER in
conf file?
I think in the run-on-yarn doc, the example for yarn standalone mode mentioned
that you
Hi Azuryy
Please Check https://spark-project.atlassian.net/browse/SPARK-995 for this
protobuf version issue
Best Regards,
Raymond Liu
-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: Monday, December 16, 2013 10:30 AM
To: dev@spark.incubator.apache.org
Subject: Re:
am not sure, might have problem. If have
problem, you might need to build mesos against 2.5.0, I don't test that, if you
got time, mind take a test?
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com]
Sent: Monday, December 16, 2013 10:48 AM
Hi Patrick
What does that means for drop YARN 2.2? seems codes are still there.
You mean if build upon 2.2 it will break, and won't and work right? Since the
home made akka build on scala 2.10 are not there. While, if for this case, can
we just use akka 2.3-M1 which run on protobuf 2.5
. Akka is the source of our hardest-to-find bugs and
simultaneously trying to support 2.2.3 and 2.3-M1 is a bit daunting.
Of course, if you are building off of master you can maintain a fork that
uses this.
- Patrick
On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond raymond@intel.comwrote
of libjar within A and the
whole thing becomes moot anyway...
-Stephen
On 11 December 2013 00:52, Liu, Raymond raymond@intel.com wrote:
Thanks Stephen
I see your solution is let B manage the libjar version. While
this is against my wish, I wish B to know nothing about A's internal
as the
final later after B and *override* the transitive dep on libjar in the two
fatjar building modules
On Tuesday, 10 December 2013, Liu, Raymond wrote:
Hi
I have a project with module A that will be built with or without
profile say -Pnewlib , thus I can have it build with different version
can then swap in any compatible version of libjar by providing it at
run-time.
Still not sure why you can not just use the latest version of libjar if any
version will work at run-time.
On 10/12/2013 7:52 PM, Liu, Raymond wrote:
Thanks Stephen
I see your solution is let B manage
Hi
I have a project with module A that will be built with or without profile say
-Pnewlib , thus I can have it build with different version of library
dependency let's say by default use libjar-1.0 and when -Pnewlib will use
libjar-2.0.
Then, I have a module B to depends on module A. The
YARN Alpha API support is already there, If you mean Yarn stable API in hadoop
2.2, it probably will be in 0.8.1
Best Regards,
Raymond Liu
From: Pranay Tonpay [mailto:pranay.ton...@impetus.co.in]
Sent: Thursday, December 05, 2013 12:53 AM
To: user@spark.incubator.apache.org
Subject: Spark over
What version of code you are using?
2.2.0 support not yet merged into trunk. Check out
https://github.com/apache/incubator-spark/pull/199
Best Regards,
Raymond Liu
From: horia@gmail.com [mailto:horia@gmail.com] On Behalf Of Horia
Sent: Monday, December 02, 2013 3:00 PM
To:
Hi
It seems to me that a lot of hadoop 2.2 support work is done on trunk. While I
don't find documentations related. So any doc I can refer to , say build / run
BKM etc.? Especially those part related to Yarn Stable API in hadoop 2.2 since
it changes a lot from alpha.
Best Regards,
Raymond
://kafka.apache.org/code.html
and build it yourself.
2013/11/8 Liu, Raymond raymond@intel.com:
If I want to use kafka_2.10 0.8.0-beta1, which repo I should go to?
Seems apache repo don't have it. While there are com.sksamuel.kafka
and
com.twitter.tormenta-kafka_2.10
Which one should I
Hi
It seems to me that dev branches are sync with master by keep merging
trunk codes. E.g. scala-2.10 branches continuously merge latest master code
into itself for update.
While I am wondering, what's the general guide line on doing this? It
seems to me that not every code in
a maintenance release of Akka that
supports protobuf 2.5.
None of these are ideal, but we'd have to pick one. It would be great if you
have other suggestions.
On Sun, Nov 3, 2013 at 11:46 PM, Liu, Raymond raymond@intel.com wrote:
Hi
I am working on porting spark onto Hadoop 2.2.0
Hi
I am encounter an issue that the executor actor could not connect to Driver
actor. But I could not figure out what's the reason.
Say the Driver actor is listening on :35838
root@sr434:~# netstat -lpv
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address
I am also working on porting the trunk code onto 2.2.0. Seems quite many API
changes but many of them are just a rename work.
While Yarn 2.1.0 beta also add some client API for easy interaction with YARN
framework, but there are not many examples on how to use them ( API and wiki
doc are both
You don't need, if the wiki page is correct.
Best Regards,
Raymond Liu
From: ch huang [mailto:justlo...@gmail.com]
Sent: Tuesday, October 29, 2013 12:01 PM
To: user@hadoop.apache.org
Subject: if i configed NN HA,should i still need start backup node?
ATT
Hi
I am playing with YARN 2.2, try to porting some code from pre-beta API
on to the stable API. While both the wiki doc and API doc for 2.2.0 seems still
stick with the old API. Though I could find some help from
Hi
I have setting up Hadoop 2.2.0 HA cluster following :
http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Configuration_details
And I can check both the active and standby namenode with WEB interface.
While, it seems that the logical name could
Encounter Similar issue with NN HA URL
Have you make it work?
Best Regards,
Raymond Liu
-Original Message-
From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com]
Sent: Friday, October 18, 2013 5:17 PM
To: user@hadoop.apache.org
Subject: Using Hbase with NN HA
Hi team,
Can Hbase be
Hmm, my bad. NameserviceID is not sync in one of the properties
After fix, it works.
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com]
Sent: Thursday, October 24, 2013 3:03 PM
To: user@hadoop.apache.org
Subject: How to use Hadoop2 HA's
Hi
I could run spark trunk code on top of yarn 2.0.5-alpha by
SPARK_JAR=./core/target/spark-core-assembly-0.8.0-SNAPSHOT.jar ./run
spark.deploy.yarn.Client \
--jar examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar \
--class spark.examples.SparkPi \
--args
/MAPREDUCE-3193.
You can give input dir to the Job which doesn't have nested dir's or you can
make use of the old FileInputFormat API to read files recursively in the sub
dir's.
Thanks
Devaraj k
-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com]
Sent: 12 July 2013 12
Hi
I just start to try out hadoop2.0, I use the 2.0.5-alpha package
And follow
http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html
to setup a cluster in non-security mode. HDFS works fine with client tools.
While when I run wordcount example, there
Hi
If all the data is already in RS blockcache.
Then what's the typical scan latency for scan a few rows from a say
several GB table ( with dozens of regions ) on a small cluster with say 4 RS ?
A few ms? Tens of ms? Or more?
Best Regards,
Raymond Liu
ramkrishna.s.vasude...@gmail.com wrote:
What is that you are observing now?
Regards
Ram
On Mon, Jun 3, 2013 at 2:00 PM, Liu, Raymond raymond@intel.com
wrote:
Hi
If all the data is already in RS blockcache.
Then what's the typical scan latency for scan a few
How about this one : https://issues.apache.org/jira/browse/HBASE-8542
Best Regards,
Raymond Liu
-Original Message-
From: Lior Schachter [mailto:lior...@gmail.com]
Sent: Thursday, May 16, 2013 1:18 AM
To: user
Subject: Re: checkAnd...
yes, I believe this will cover most of the
at large scale better. I'm saying this
since you have one ID referencing another ID (using target ID).
On May 10, 2013, at 11:47 AM, Liu, Raymond raymond@intel.com
wrote:
Thanks, seems there are no other better solution?
Really need a GetAndPut atomic op here ...
You can do
Thanks, seems there are no other better solution?
Really need a GetAndPut atomic op here ...
You can do this by looping over a checkAndPut operation until it succeeds.
-Mike
On Thu, May 9, 2013 at 8:52 PM, Liu, Raymond raymond@intel.com
wrote:
Any suggestion?
Hi
Any suggestion?
Hi
Say, I have four field for one record :id, status, targetid, and count.
Status is on and off, target could reference other id, and count will
record
the number of on status for all targetid from same id.
The record could be add / delete, or
Hi
Say, I have four field for one record :id, status, targetid, and count.
Status is on and off, target could reference other id, and count will
record the number of on status for all targetid from same id.
The record could be add / delete, or updated to change the
Btw. Is that possible or practice to implement something like PutAndGet
which put in new row and return the old row back to client been implemented?
That would help a lot for my case.
Oh, I realized that it is better to be named as GetAndMutate, say Mutate
anyway, but return the
So what is lacking here? The action should also been parallel inside RS for
each region, Instead of just parallel on RS level?
Seems this will be rather difficult to implement, and for Get, might not be
worthy?
I looked
at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
I guess rob mean that use one query to query rcfile and HBASE table at the same
time.
If your query is on two table, one upon rcfile, another upon HBASE through
hbase storage handler, I think that should be ok.
Best Regards,
Raymond Liu
what's mean a composite query? Hive's query doesn't
It seems to me that a major_compact table command from hbase shell do not fush
memstore? When I done with major compact, still some data in memstore and will
be flush out to disk when I shut down hbase cluster.
Best Regards,
Raymond Liu
is flushed? Then a user invoked compact
don't force to flush it?
Best Regards,
Raymond Liu
Did you try from java api? If flush does not happen we may need to fix it.
Regards
RAm
On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond raymond@intel.com
wrote:
It seems to me
that it will end up in a
single store file per region.
Best Regards,
Raymond Liu
Raymond:
Major compaction does not first flush. Should it or should it be an option?
St.Ack
On Tue, Mar 12, 2013 at 6:46 PM, Liu, Raymond raymond@intel.com
wrote:
I tried both hbase shell's
Just curious, won't ROWCOL bloom filter works for this case?
Best Regards,
Raymond Liu
As per the above said, you will need a full table scan on that CF.
As Ted said, consider having a look at your schema design.
-Anoop-
On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu yuzhih...@gmail.com
(qualifier)
is present in an HFile or not. But for the user he dont know the rowkeys. He
wants all the rows with column 'x'
-Anoop-
From: Liu, Raymond [raymond@intel.com]
Sent: Monday, March 11, 2013 7:43 AM
To: user@hbase.apache.org
Subject: RE
Hi
Is there any way to balance just one table? I found one of my table is not
balanced, while all the other table is balanced. So I want to fix this table.
Best Regards,
Raymond Liu
: Is there any way to balance one table?
What version of HBase are you using ?
0.94 has per-table load balancing.
Cheers
On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com
wrote:
Hi
Is there any way to balance just one table? I found one of my table is
not balanced
, February 20, 2013 9:09 AM
To: user@hbase.apache.org
Subject: Re: Is there any way to balance one table?
What version of HBase are you using ?
0.94 has per-table load balancing.
Cheers
On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com
wrote:
Hi
@hbase.apache.org
Subject: Re: Is there any way to balance one table?
Hi Liu,
Why did not you simply called the balancer? If other tables are already
balanced, it should not touch them and will only balance the table which is
not
balancer?
JM
2013/2/19, Liu, Raymond raymond@intel.com
on this table?
Best Regards,
Raymond Liu
From: Marcos Ortiz [mailto:mlor...@uci.cu]
Sent: Wednesday, February 20, 2013 11:44 AM
To: user@hbase.apache.org
Cc: Liu, Raymond
Subject: Re: Is there any way to balance one table?
What is the size of your table?
On 02/19/2013 10:40 PM, Liu, Raymond wrote:
Hi
count on any server can be as far as 20% from average region
count.
You can tighten sloppiness.
On Tue, Feb 19, 2013 at 7:40 PM, Liu, Raymond raymond@intel.com
wrote:
Hi
I do call balancer, while it seems it doesn't work. Might due to this
table is small and overall region
Hmm, in order to have the 96 region table be balanced within 20% On a 3000
region cluster when all other table is balanced.
the slop will need to be around 20%/30, say 0.006? won't it be too small?
Yes, Raymond.
You should lower sloppiness.
On Tue, Feb 19, 2013 at 7:48 PM, Liu, Raymond
You mean slop is also base on per table?
Weird, then it should work for my case let me check again.
Best Regards,
Raymond Liu
bq. On a 3000 region cluster
Balancing is per-table. Meaning total number of regions doesn't come into
play.
On Tue, Feb 19, 2013 at 7:55 PM, Liu, Raymond
, the BlockSender.sendChunks will read and
sent data in 64K bytes units?
Is that true? And if so, won't it explain that read through datanode will be
faster? Since it read data in bigger block size.
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com
in 64K bytes units?
Is that true? And if so, won't it explain that read through datanode
will be faster? Since it read data in bigger block size.
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com]
Sent: Saturday, February 16
Hi
I tried to use short circuit read to improve my hbase cluster MR scan
performance.
I have the following setting in hdfs-site.xml
dfs.client.read.shortcircuit set to true
dfs.block.local-path-access.user set to MR job runner.
The cluster is 1+4 node
,
did you enable security feature in your cluster? there'll be no obvious
benefit
be found if so.
Regards,
Liang
___
发件人: Liu, Raymond [raymond@intel.com]
发送时间: 2013年2月16日 11:10
收件人: user@hadoop.apache.org
主题: why my test result on dfs short
will
be attempted but will begin to fail.
On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond raymond@intel.com
wrote:
Hi
I tried to use short circuit read to improve my hbase cluster MR
scan performance.
I have the following setting in hdfs-site.xml
for file
This would confirm that short circuit read is happening.
--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/
On Feb 15, 2013, at 9:53 PM, Liu, Raymond raymond@intel.com wrote:
Hi Harsh
Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml.
And I have
that read through datanode will be
faster? Since it read data in bigger block size.
Best Regards,
Raymond Liu
-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com]
Sent: Saturday, February 16, 2013 2:23 PM
To: user@hadoop.apache.org
Subject: RE: why my test result on dfs
Is that also possible to control which disk the blocks are assigned?
Say when there are multiple disks on one node, I wish the blocks belong to the
local region distribute evenly across the disks.
At present, it seems to that it is not. Though if you take non local regions'
replica blocks in
://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/helpful.
It explains the problem
and solution in great detail.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Thu, Jan 17, 2013 at 12:14 PM, Liu, Raymond
raymond@intel.comwrote:
Hi
I have
Hi
I have hadoop 1.1.1 and hbase 0.94.1. Around 300 region on each region
server. Right after the cluster is started, before I do anything. There are
already around 500 (CLOSE_WAIT) connection from regionserver process to
Datanode process. Is that normal?
Seems there are a
On 1/4/13 10:37 PM, Liu, Raymond raymond@intel.com wrote:
Hi
I encounter a weird lag behind map task issue here :
I have a small hadoop/hbase cluster with 1 master node and 4
regionserver node all have 16 CPU with map and reduce slot set to 24.
A few table
1 - 100 of 149 matches
Mail list logo