RE: memory size for caching RDD

2014-09-04 Thread Liu, Raymond
You don’t need to. It is not static allocated to RDD cache, it is just an up limit. If you don’t use up the memory by RDD cache, it is always available for other usage. except those one also controlled by some memoryFraction conf. e.g. spark.shuffle.memoryFraction which you also set the up

RE: memory size for caching RDD

2014-09-04 Thread Liu, Raymond
Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:57 PM To: Liu, Raymond Cc: Patrick Wendell; u...@spark.apache.org; dev@spark.apache.org Subject: Re: memory size for caching RDD Oh I see. I want to implement something like this: sometimes I need

RE: RDDs

2014-09-04 Thread Liu, Raymond
Actually, a replicated RDD and a parallel job on the same RDD, this two conception is not related at all. A replicated RDD just store data on multiple node, it helps with HA and provide better chance for data locality. It is still one RDD, not two separate RDD. While regarding run two jobs on

RE: memory size for caching RDD

2014-09-04 Thread Liu, Raymond
You don’t need to. It is not static allocated to RDD cache, it is just an up limit. If you don’t use up the memory by RDD cache, it is always available for other usage. except those one also controlled by some memoryFraction conf. e.g. spark.shuffle.memoryFraction which you also set the up

RE: memory size for caching RDD

2014-09-04 Thread Liu, Raymond
Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:57 PM To: Liu, Raymond Cc: Patrick Wendell; user@spark.apache.org; d...@spark.apache.org Subject: Re: memory size for caching RDD Oh I see. I want to implement something like this: sometimes I need

RE: RDDs

2014-09-03 Thread Liu, Raymond
Not sure what did you refer to when saying replicated rdd, if you actually mean RDD, then, yes , read the API doc and paper as Tobias mentioned. If you actually focus on the word replicated, then that is for fault tolerant, and probably mostly used in the streaming case for receiver created

RE: resize memory size for caching RDD

2014-09-03 Thread Liu, Raymond
AFAIK, No. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 11:30 AM To: user@spark.apache.org Subject: resize memory size for caching RDD Dear all: Spark uses memory to cache RDD and the memory size is specified by

RE: how to filter value in spark

2014-08-31 Thread Liu, Raymond
You could use cogroup to combine RDDs in one RDD for cross reference processing. e.g. a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty r.nonEmpty }. map{case (k,(l,r)) = (k, l)} Best Regards, Raymond Liu -Original Message- From: marylucy [mailto:qaz163wsx_...@hotmail.com] Sent:

RE: What is a Block Manager?

2014-08-27 Thread Liu, Raymond
, Raymond Liu From: Victor Tso-Guillen [mailto:v...@paxata.com] Sent: Wednesday, August 27, 2014 1:40 PM To: Liu, Raymond Cc: user@spark.apache.org Subject: Re: What is a Block Manager? We're a single-app deployment so we want to launch as many executors as the system has workers. We accomplish

RE: What is a Block Manager?

2014-08-26 Thread Liu, Raymond
Basically, a Block Manager manages the storage for most of the data in spark, name a few: block that represent a cached RDD partition, intermediate shuffle data, broadcast data etc. it is per executor, while in standalone mode, normally, you have one executor per worker. You don't control how

RE: Request for help in writing to Textfile

2014-08-25 Thread Liu, Raymond
You can try to manipulate the string you want to output before saveAsTextFile, something like modify. flatMap(x=x).map{x= val s=x.toString s.subSequence(1,s.length-1) } Should have more optimized way. Best Regards, Raymond Liu -Original Message- From: yh18190

KMeansIterative in master branch ?

2014-07-15 Thread Liu, Raymond
I could not found the examples/flink-java-examples-*-KMeansIterative.jar in the trunk code. Is that removed? If so, then the run_example_quickstart doc need to be updated too. Best Regards, Raymond Liu

RE: MIMA Compatiblity Checks

2014-07-10 Thread Liu, Raymond
so how to run the check locally? On master tree, sbt mimaReportBinaryIssues Seems to lead to a lot of errors reported. Do we need to modify SparkBuilder.scala etc to run it locally? Could not figure out how Jekins run the check on its console outputs. Best Regards, Raymond Liu -Original

Unable to compile both trunk and 0.5.1 code

2014-06-30 Thread Liu, Raymond
Hi Just clone the code from incubator flink, I tried both trunk code and 0.5.1 release, and encounter the same problem. # mvn --version Apache Maven 3.0.4 Maven home: /usr/share/maven Java version: 1.6.0_30, vendor: Sun Microsystems Inc. Java home: /usr/java/jdk1.6.0_30/jre Default locale:

RE: Unable to compile both trunk and 0.5.1 code

2014-06-30 Thread Liu, Raymond
Actually, tried jdk 1.7.0. It works. While the project readme said that Java 6, 7 or 8 both works... Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Monday, June 30, 2014 4:13 PM To: dev@flink.incubator.apache.org Subject: Unable

RE: Unable to compile both trunk and 0.5.1 code

2014-06-30 Thread Liu, Raymond
, Robert On Mon, Jun 30, 2014 at 10:20 AM, Liu, Raymond raymond@intel.com wrote: Actually, tried jdk 1.7.0. It works. While the project readme said that Java 6, 7 or 8 both works... Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com

RE: About StorageLevel

2014-06-26 Thread Liu, Raymond
I think there is a shuffle stage involved. And the future count job will depends on the first job’s shuffle stages’s output data directly as long as it is still available. Thus it will be much faster. Best Regards, Raymond Liu From: tomsheep...@gmail.com [mailto:tomsheep...@gmail.com] Sent:

RE: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-06-05 Thread Liu, Raymond
If some task have no locality preference, it will also show up as PROCESS_LOCAL, yet, I think we probably need to name it NO_PREFER to make it more clear. Not sure is this your case. Best Regards, Raymond Liu From: coded...@gmail.com [mailto:coded...@gmail.com] On Behalf Of Sung Hwan Chung

RE: yarn-client mode question

2014-05-21 Thread Liu, Raymond
Seems you are asking that does spark related jar need to be deploy to yarn cluster manually before you launch application? Then, no , you don't, just like other yarn application. And it doesn't matter it is yarn-client or yarn-cluster mode.. Best Regards, Raymond Liu -Original

RE: different in spark on yarn mode and standalone mode

2014-05-04 Thread Liu, Raymond
In the core, they are not quite different In standalone mode, you have spark master and spark worker who allocate driver and executors for your spark app. While in Yarn mode, Yarn resource manager and node manager do this work. When the driver and executors have been launched, the rest part of

RE: How fast would you expect shuffle serialize to be?

2014-04-30 Thread Liu, Raymond
. So it seems to me that when running the full path code in my previous case, 32 core with 50MB/s total throughput are reasonable? Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Later case, total throughput aggregated from all cores

RE: Shuffle Spill Issue

2014-04-29 Thread Liu, Raymond
per word occurrence. On Tue, Apr 29, 2014 at 7:48 AM, Liu, Raymond raymond@intel.com wrote: Hi  Patrick         I am just doing simple word count , the data is generated by hadoop random text writer.         This seems to me not quite related to compress , If I turn off compress on shuffle

How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
Hi I am running a WordCount program which count words from HDFS, and I noticed that the serializer part of code takes a lot of CPU time. On a 16core/32thread node, the total throughput is around 50MB/s by JavaSerializer, and if I switching to KryoSerializer, it doubles to around

RE: How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
For all the tasks, say 32 task on total Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Is this the serialization throughput per task or the serialization throughput for all the tasks? On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond

RE: How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
directly instead of read from HDFS, similar throughput result) Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] For all the tasks, say 32 task on total Best Regards, Raymond Liu -Original Message- From: Patrick Wendell

RE: How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
, 2014 at 10:14 PM, Liu, Raymond raymond@intel.com wrote: For all the tasks, say 32 task on total Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Is this the serialization throughput per task or the serialization throughput

Shuffle Spill Issue

2014-04-28 Thread Liu, Raymond
Hi I am running a simple word count program on spark standalone cluster. The cluster is made up of 6 node, each run 4 worker and each worker own 10G memory and 16 core thus total 96 core and 240G memory. ( well, also used to configed as 1 worker with 40G memory on each node )

RE: Shuffle Spill Issue

2014-04-28 Thread Liu, Raymond
is definitely a bit strange, the data gets compressed when written to disk, but unless you have a weird dataset (E.g. all zeros) I wouldn't expect it to compress _that_ much. On Mon, Apr 28, 2014 at 1:18 AM, Liu, Raymond raymond@intel.com wrote: Hi         I am running a simple word count program

RE: Does yarn-stable still accept pull request?

2014-02-11 Thread Liu, Raymond
Should be fixed in both alpha and stable code base, since we aim to support both version Best Regards, Raymond Liu -Original Message- From: Nan Zhu [mailto:zhunanmcg...@gmail.com] Sent: Wednesday, February 12, 2014 10:29 AM To: dev@spark.incubator.apache.org Subject: Does yarn-stable

RE: Spark Master on Hadoop Job Tracker?

2014-01-20 Thread Liu, Raymond
Not sure what did you aim to solve. When you mention Spark Master, I guess you probably mean spark standalone mode? In that case spark cluster does not necessary coupled with hadoop cluster. While if you aim to achieve better data locality , then yes, run spark worker on HDFS data node might

RE: Anyone know hot to submit spark job to yarn in java code?

2014-01-15 Thread Liu, Raymond
Hi Regarding your question 1) when I run the above script, which jar is beed submitted to the yarn server ? What SPARK_JAR env point to and the --jar point to are both submitted to the yarn server 2) It like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of

RE: yarn, fat-jars and lib_managed

2014-01-09 Thread Liu, Raymond
I think you could put the spark jar and other jar your app depends on while not changes a lot on HDFS, and use --files or --addjars ( depends on the mode you run YarnClient/YarnStandalone ) to refer to them. And then just need to redeploy your thin app jar on each invoke. Best Regards, Raymond

RE: Spark on Yarn classpath problems

2014-01-07 Thread Liu, Raymond
Not found in which part of code? If in sparkContext thread, say on AM, --addJars should work If on tasks, then --addjars won't work, you need to use --file=local://xxx etc, not sure is it available in 0.8.1. And adding to a single jar should also work, if not works, might be something wrong

RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
I think you also need to set yarn.version Say something like mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package hadoop.version is default to 2.2.0 while yarn.version not when you chose the new-yarn profile. We probably need to fix it later for easy usage.

RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
Sorry , mvn -Pnew-yarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package The one in previous mail not yet available. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond Sent: Friday, January 03, 2014 2:09 PM To: dev@spark.incubator.apache.org Subject

RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
@spark.incubator.apache.org Subject: Re: compiling against hadoop 2.2 Specification of yarn.version can be inserted following this line (#762 in pom.xml), right ? hadoop.version2.2.0/hadoop.version On Thu, Jan 2, 2014 at 10:10 PM, Liu, Raymond raymond@intel.com wrote: Sorry , mvn -Pnew-yarn

RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
And I am not sure where it is value able to providing different setting for hadoop/hdfs and yarn version. When build with SBT, they will always be the same. Maybe in mvn we should do so too. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond Sent: Friday, January 03

RE: Errors with spark-0.8.1 hadoop-yarn 2.2.0

2013-12-29 Thread Liu, Raymond
Hi Izhar Is that the exact command you are running? Say with 0.8.0 instead of 0.8.1 in the cmd? Raymond Liu From: Izhar ul Hassan [mailto:ezh...@gmail.com] Sent: Friday, December 27, 2013 9:40 PM To: user@spark.incubator.apache.org Subject: Errors with spark-0.8.1 hadoop-yarn 2.2.0

RE: Unable to load additional JARs in yarn-client mode

2013-12-23 Thread Liu, Raymond
Ido, when you say add external JARS, do you mean by -addJars which adding some jar for SparkContext to use in the AM env? If so, I think you don't need it for yarn-cilent mode at all, for yarn-client mode, SparkContext running locally, I think you just need to make sure those jars are in the

RE: About spark.driver.host

2013-12-16 Thread Liu, Raymond
It's what it said on the document. For yarn-standalone mode, it will be the host of where spark AM runs, while for yarn-client mode, it will be the local host you run the cmd. And what's cmd you run SparkPi ? I think you actually don't need to set sprak.driver.host manually for Yarn mode ,

RE: About spark.driver.host

2013-12-16 Thread Liu, Raymond
: AppMaster received a signal. 13/12/17 11:07:13 WARN yarn.ApplicationMaster: Failed to connect to driver at null:null, retrying ... After retry 'spark.yarn.applicationMaster.waitTries'(default 10), Job failed. On Tue, Dec 17, 2013 at 12:07 PM, Liu, Raymond raymond@intel.commailto:raymond

RE: About spark.driver.host

2013-12-16 Thread Liu, Raymond
-distributed? On Tue, Dec 17, 2013 at 1:03 PM, Liu, Raymond raymond@intel.commailto:raymond@intel.com wrote: Hmm, I don't see what mode you are trying to use? You specify the MASTER in conf file? I think in the run-on-yarn doc, the example for yarn standalone mode mentioned that you

RE: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-15 Thread Liu, Raymond
Hi Azuryy Please Check https://spark-project.atlassian.net/browse/SPARK-995 for this protobuf version issue Best Regards, Raymond Liu -Original Message- From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Monday, December 16, 2013 10:30 AM To: dev@spark.incubator.apache.org Subject: Re:

RE: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-15 Thread Liu, Raymond
am not sure, might have problem. If have problem, you might need to build mesos against 2.5.0, I don't test that, if you got time, mind take a test? Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Monday, December 16, 2013 10:48 AM

RE: Scala 2.10 Merge

2013-12-12 Thread Liu, Raymond
Hi Patrick What does that means for drop YARN 2.2? seems codes are still there. You mean if build upon 2.2 it will break, and won't and work right? Since the home made akka build on scala 2.10 are not there. While, if for this case, can we just use akka 2.3-M1 which run on protobuf 2.5

RE: Scala 2.10 Merge

2013-12-12 Thread Liu, Raymond
. Akka is the source of our hardest-to-find bugs and simultaneously trying to support 2.2.3 and 2.3-M1 is a bit daunting. Of course, if you are building off of master you can maintain a fork that uses this. - Patrick On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond raymond@intel.comwrote

RE: How to resolve right dependency which enabled and built/install with profile?

2013-12-11 Thread Liu, Raymond
of libjar within A and the whole thing becomes moot anyway... -Stephen On 11 December 2013 00:52, Liu, Raymond raymond@intel.com wrote: Thanks Stephen I see your solution is let B manage the libjar version. While this is against my wish, I wish B to know nothing about A's internal

RE: How to resolve right dependency which enabled and built/install with profile?

2013-12-10 Thread Liu, Raymond
as the final later after B and *override* the transitive dep on libjar in the two fatjar building modules On Tuesday, 10 December 2013, Liu, Raymond wrote: Hi I have a project with module A that will be built with or without profile say -Pnewlib , thus I can have it build with different version

RE: How to resolve right dependency which enabled and built/install with profile?

2013-12-10 Thread Liu, Raymond
can then swap in any compatible version of libjar by providing it at run-time. Still not sure why you can not just use the latest version of libjar if any version will work at run-time. On 10/12/2013 7:52 PM, Liu, Raymond wrote: Thanks Stephen I see your solution is let B manage

How to resolve right dependency which enabled and built/install with profile?

2013-12-09 Thread Liu, Raymond
Hi I have a project with module A that will be built with or without profile say -Pnewlib , thus I can have it build with different version of library dependency let's say by default use libjar-1.0 and when -Pnewlib will use libjar-2.0. Then, I have a module B to depends on module A. The

RE: Spark over YARN

2013-12-04 Thread Liu, Raymond
YARN Alpha API support is already there, If you mean Yarn stable API in hadoop 2.2, it probably will be in 0.8.1 Best Regards, Raymond Liu From: Pranay Tonpay [mailto:pranay.ton...@impetus.co.in] Sent: Thursday, December 05, 2013 12:53 AM To: user@spark.incubator.apache.org Subject: Spark over

RE: Worker failed to connect when build with SPARK_HADOOP_VERSION=2.2.0

2013-12-02 Thread Liu, Raymond
What version of code you are using? 2.2.0 support not yet merged into trunk. Check out https://github.com/apache/incubator-spark/pull/199 Best Regards, Raymond Liu From: horia@gmail.com [mailto:horia@gmail.com] On Behalf Of Horia Sent: Monday, December 02, 2013 3:00 PM To:

Any doc related to hive on hadoop 2.2?

2013-12-02 Thread Liu, Raymond
Hi It seems to me that a lot of hadoop 2.2 support work is done on trunk. While I don't find documentations related. So any doc I can refer to , say build / run BKM etc.? Especially those part related to Yarn Stable API in hadoop 2.2 since it changes a lot from alpha. Best Regards, Raymond

RE: which repo for kafka_2.10 ?

2013-11-08 Thread Liu, Raymond
://kafka.apache.org/code.html and build it yourself. 2013/11/8 Liu, Raymond raymond@intel.com: If I want to use kafka_2.10 0.8.0-beta1, which repo I should go to? Seems apache repo don't have it. While there are com.sksamuel.kafka and com.twitter.tormenta-kafka_2.10 Which one should I

what's the strategy for code sync between branches e.g. scala-2.10 v.s. master?

2013-11-04 Thread Liu, Raymond
Hi It seems to me that dev branches are sync with master by keep merging trunk codes. E.g. scala-2.10 branches continuously merge latest master code into itself for update. While I am wondering, what's the general guide line on doing this? It seems to me that not every code in

RE: issue regarding akka, protobuf and Hadoop version

2013-11-04 Thread Liu, Raymond
a maintenance release of Akka that supports protobuf 2.5. None of these are ideal, but we'd have to pick one. It would be great if you have other suggestions. On Sun, Nov 3, 2013 at 11:46 PM, Liu, Raymond raymond@intel.com wrote: Hi I am working on porting spark onto Hadoop 2.2.0

Executor could not connect to Driver?

2013-11-01 Thread Liu, Raymond
Hi I am encounter an issue that the executor actor could not connect to Driver actor. But I could not figure out what's the reason. Say the Driver actor is listening on :35838 root@sr434:~# netstat -lpv Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address

RE: spark-0.8.0 and hadoop-2.1.0-beta

2013-10-29 Thread Liu, Raymond
I am also working on porting the trunk code onto 2.2.0. Seems quite many API changes but many of them are just a rename work. While Yarn 2.1.0 beta also add some client API for easy interaction with YARN framework, but there are not many examples on how to use them ( API and wiki doc are both

RE: if i configed NN HA,should i still need start backup node?

2013-10-29 Thread Liu, Raymond
You don't need, if the wiki page is correct. Best Regards, Raymond Liu From: ch huang [mailto:justlo...@gmail.com] Sent: Tuesday, October 29, 2013 12:01 PM To: user@hadoop.apache.org Subject: if i configed NN HA,should i still need start backup node? ATT

Yarn 2.2 docs or examples?

2013-10-29 Thread Liu, Raymond
Hi I am playing with YARN 2.2, try to porting some code from pre-beta API on to the stable API. While both the wiki doc and API doc for 2.2.0 seems still stick with the old API. Though I could find some help from

How to use Hadoop2 HA's logical name URL?

2013-10-24 Thread Liu, Raymond
Hi I have setting up Hadoop 2.2.0 HA cluster following : http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Configuration_details And I can check both the active and standby namenode with WEB interface. While, it seems that the logical name could

RE: Using Hbase with NN HA

2013-10-24 Thread Liu, Raymond
Encounter Similar issue with NN HA URL Have you make it work? Best Regards, Raymond Liu -Original Message- From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com] Sent: Friday, October 18, 2013 5:17 PM To: user@hadoop.apache.org Subject: Using Hbase with NN HA Hi team, Can Hbase be

RE: How to use Hadoop2 HA's logical name URL?

2013-10-24 Thread Liu, Raymond
Hmm, my bad. NameserviceID is not sync in one of the properties After fix, it works. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Thursday, October 24, 2013 3:03 PM To: user@hadoop.apache.org Subject: How to use Hadoop2 HA's

Fail to run on yarn with release version?

2013-08-16 Thread Liu, Raymond
Hi I could run spark trunk code on top of yarn 2.0.5-alpha by SPARK_JAR=./core/target/spark-core-assembly-0.8.0-SNAPSHOT.jar ./run spark.deploy.yarn.Client \ --jar examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar \ --class spark.examples.SparkPi \ --args

RE: Failed to run wordcount on YARN

2013-07-14 Thread Liu, Raymond
/MAPREDUCE-3193. You can give input dir to the Job which doesn't have nested dir's or you can make use of the old FileInputFormat API to read files recursively in the sub dir's. Thanks Devaraj k -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: 12 July 2013 12

Failed to run wordcount on YARN

2013-07-12 Thread Liu, Raymond
Hi I just start to try out hadoop2.0, I use the 2.0.5-alpha package And follow http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html to setup a cluster in non-security mode. HDFS works fine with client tools. While when I run wordcount example, there

what's the typical scan latency?

2013-06-03 Thread Liu, Raymond
Hi If all the data is already in RS blockcache. Then what's the typical scan latency for scan a few rows from a say several GB table ( with dozens of regions ) on a small cluster with say 4 RS ? A few ms? Tens of ms? Or more? Best Regards, Raymond Liu

RE: what's the typical scan latency?

2013-06-03 Thread Liu, Raymond
ramkrishna.s.vasude...@gmail.com wrote: What is that you are observing now? Regards Ram On Mon, Jun 3, 2013 at 2:00 PM, Liu, Raymond raymond@intel.com wrote: Hi If all the data is already in RS blockcache. Then what's the typical scan latency for scan a few

RE: checkAnd...

2013-05-15 Thread Liu, Raymond
How about this one : https://issues.apache.org/jira/browse/HBASE-8542 Best Regards, Raymond Liu -Original Message- From: Lior Schachter [mailto:lior...@gmail.com] Sent: Thursday, May 16, 2013 1:18 AM To: user Subject: Re: checkAnd... yes, I believe this will cover most of the

RE: How to implement this check put and then update something logic?

2013-05-13 Thread Liu, Raymond
at large scale better. I'm saying this since you have one ID referencing another ID (using target ID). On May 10, 2013, at 11:47 AM, Liu, Raymond raymond@intel.com wrote: Thanks, seems there are no other better solution? Really need a GetAndPut atomic op here ... You can do

RE: How to implement this check put and then update something logic?

2013-05-10 Thread Liu, Raymond
Thanks, seems there are no other better solution? Really need a GetAndPut atomic op here ... You can do this by looping over a checkAndPut operation until it succeeds. -Mike On Thu, May 9, 2013 at 8:52 PM, Liu, Raymond raymond@intel.com wrote: Any suggestion? Hi

RE: How to implement this check put and then update something logic?

2013-05-09 Thread Liu, Raymond
Any suggestion? Hi Say, I have four field for one record :id, status, targetid, and count. Status is on and off, target could reference other id, and count will record the number of on status for all targetid from same id. The record could be add / delete, or

How to implement this check put and then update something logic?

2013-05-08 Thread Liu, Raymond
Hi Say, I have four field for one record :id, status, targetid, and count. Status is on and off, target could reference other id, and count will record the number of on status for all targetid from same id. The record could be add / delete, or updated to change the

RE: How to implement this check put and then update something logic?

2013-05-08 Thread Liu, Raymond
Btw. Is that possible or practice to implement something like PutAndGet which put in new row and return the old row back to client been implemented? That would help a lot for my case. Oh, I realized that it is better to be named as GetAndMutate, say Mutate anyway, but return the

RE: 答复: HBase random read performance

2013-04-16 Thread Liu, Raymond
So what is lacking here? The action should also been parallel inside RS for each region, Instead of just parallel on RS level? Seems this will be rather difficult to implement, and for Get, might not be worthy? I looked at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

RE: composite query on hbase and rcfile

2013-04-10 Thread Liu, Raymond
I guess rob mean that use one query to query rcfile and HBASE table at the same time. If your query is on two table, one upon rcfile, another upon HBASE through hbase storage handler, I think that should be ok. Best Regards, Raymond Liu what's mean a composite query? Hive's query doesn't

Does a major compact flush memstore?

2013-03-12 Thread Liu, Raymond
It seems to me that a major_compact table command from hbase shell do not fush memstore? When I done with major compact, still some data in memstore and will be flush out to disk when I shut down hbase cluster. Best Regards, Raymond Liu

RE: Does a major compact flush memstore?

2013-03-12 Thread Liu, Raymond
is flushed? Then a user invoked compact don't force to flush it? Best Regards, Raymond Liu Did you try from java api? If flush does not happen we may need to fix it. Regards RAm On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond raymond@intel.com wrote: It seems to me

RE: Does a major compact flush memstore?

2013-03-12 Thread Liu, Raymond
that it will end up in a single store file per region. Best Regards, Raymond Liu Raymond: Major compaction does not first flush. Should it or should it be an option? St.Ack On Tue, Mar 12, 2013 at 6:46 PM, Liu, Raymond raymond@intel.com wrote: I tried both hbase shell's

RE: How HBase perform per-column scan?

2013-03-10 Thread Liu, Raymond
Just curious, won't ROWCOL bloom filter works for this case? Best Regards, Raymond Liu As per the above said, you will need a full table scan on that CF. As Ted said, consider having a look at your schema design. -Anoop- On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu yuzhih...@gmail.com

RE: How HBase perform per-column scan?

2013-03-10 Thread Liu, Raymond
(qualifier) is present in an HFile or not. But for the user he dont know the rowkeys. He wants all the rows with column 'x' -Anoop- From: Liu, Raymond [raymond@intel.com] Sent: Monday, March 11, 2013 7:43 AM To: user@hbase.apache.org Subject: RE

Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I want to fix this table. Best Regards, Raymond Liu

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
: Is there any way to balance one table? What version of HBase are you using ? 0.94 has per-table load balancing. Cheers On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote: Hi Is there any way to balance just one table? I found one of my table is not balanced

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
, February 20, 2013 9:09 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? What version of HBase are you using ? 0.94 has per-table load balancing. Cheers On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote: Hi

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
@hbase.apache.org Subject: Re: Is there any way to balance one table? Hi Liu, Why did not you simply called the balancer? If other tables are already balanced, it should not touch them and will only balance the table which is not balancer? JM 2013/2/19, Liu, Raymond raymond@intel.com

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
on this table? Best Regards, Raymond Liu From: Marcos Ortiz [mailto:mlor...@uci.cu] Sent: Wednesday, February 20, 2013 11:44 AM To: user@hbase.apache.org Cc: Liu, Raymond Subject: Re: Is there any way to balance one table? What is the size of your table? On 02/19/2013 10:40 PM, Liu, Raymond wrote: Hi

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
count on any server can be as far as 20% from average region count. You can tighten sloppiness. On Tue, Feb 19, 2013 at 7:40 PM, Liu, Raymond raymond@intel.com wrote: Hi I do call balancer, while it seems it doesn't work. Might due to this table is small and overall region

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
Hmm, in order to have the 96 region table be balanced within 20% On a 3000 region cluster when all other table is balanced. the slop will need to be around 20%/30, say 0.006? won't it be too small? Yes, Raymond. You should lower sloppiness. On Tue, Feb 19, 2013 at 7:48 PM, Liu, Raymond

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
You mean slop is also base on per table? Weird, then it should work for my case let me check again. Best Regards, Raymond Liu bq. On a 3000 region cluster Balancing is per-table. Meaning total number of regions doesn't come into play. On Tue, Feb 19, 2013 at 7:55 PM, Liu, Raymond

RE: why my test result on dfs short circuit read is slower?

2013-02-17 Thread Liu, Raymond
, the BlockSender.sendChunks will read and sent data in 64K bytes units? Is that true? And if so, won't it explain that read through datanode will be faster? Since it read data in bigger block size. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com

RE: why my test result on dfs short circuit read is slower?

2013-02-17 Thread Liu, Raymond
in 64K bytes units? Is that true? And if so, won't it explain that read through datanode will be faster? Since it read data in bigger block size. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Saturday, February 16

why my test result on dfs short circuit read is slower?

2013-02-15 Thread Liu, Raymond
Hi I tried to use short circuit read to improve my hbase cluster MR scan performance. I have the following setting in hdfs-site.xml dfs.client.read.shortcircuit set to true dfs.block.local-path-access.user set to MR job runner. The cluster is 1+4 node

RE: why my test result on dfs short circuit read is slower?

2013-02-15 Thread Liu, Raymond
, did you enable security feature in your cluster? there'll be no obvious benefit be found if so. Regards, Liang ___ 发件人: Liu, Raymond [raymond@intel.com] 发送时间: 2013年2月16日 11:10 收件人: user@hadoop.apache.org 主题: why my test result on dfs short

RE: why my test result on dfs short circuit read is slower?

2013-02-15 Thread Liu, Raymond
will be attempted but will begin to fail. On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond raymond@intel.com wrote: Hi I tried to use short circuit read to improve my hbase cluster MR scan performance. I have the following setting in hdfs-site.xml

RE: why my test result on dfs short circuit read is slower?

2013-02-15 Thread Liu, Raymond
for file This would confirm that short circuit read is happening. -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Feb 15, 2013, at 9:53 PM, Liu, Raymond raymond@intel.com wrote: Hi Harsh Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml. And I have

RE: why my test result on dfs short circuit read is slower?

2013-02-15 Thread Liu, Raymond
that read through datanode will be faster? Since it read data in bigger block size. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Saturday, February 16, 2013 2:23 PM To: user@hadoop.apache.org Subject: RE: why my test result on dfs

RE: Multiple RS for serving one region

2013-01-23 Thread Liu, Raymond
Is that also possible to control which disk the blocks are assigned? Say when there are multiple disks on one node, I wish the blocks belong to the local region distribute evenly across the disks. At present, it seems to that it is not. Though if you take non local regions' replica blocks in

RE: How is DataXceiver been used?

2013-01-17 Thread Liu, Raymond
://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/helpful. It explains the problem and solution in great detail. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Jan 17, 2013 at 12:14 PM, Liu, Raymond raymond@intel.comwrote: Hi I have

around 500 (CLOSE_WAIT) connection

2013-01-17 Thread Liu, Raymond
Hi I have hadoop 1.1.1 and hbase 0.94.1. Around 300 region on each region server. Right after the cluster is started, before I do anything. There are already around 500 (CLOSE_WAIT) connection from regionserver process to Datanode process. Is that normal? Seems there are a

Trouble shooting process for a random lag region issue.

2013-01-16 Thread Liu, Raymond
On 1/4/13 10:37 PM, Liu, Raymond raymond@intel.com wrote: Hi I encounter a weird lag behind map task issue here : I have a small hadoop/hbase cluster with 1 master node and 4 regionserver node all have 16 CPU with map and reduce slot set to 24. A few table

  1   2   >