RE: memory size for caching RDD
You don’t need to. It is not static allocated to RDD cache, it is just an up limit. If you don’t use up the memory by RDD cache, it is always available for other usage. except those one also controlled by some memoryFraction conf. e.g. spark.shuffle.memoryFraction which you also set the up limit. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:27 PM To: Patrick Wendell Cc: u...@spark.apache.org; dev@spark.apache.org Subject: Re: memory size for caching RDD But is it possible to make t resizable? When we don't have many RDD to cache, we can give some memory to others. 2014-09-04 13:45 GMT+08:00 Patrick Wendell pwend...@gmail.commailto:pwend...@gmail.com: Changing this is not supported, it si immutable similar to other spark configuration settings. On Wed, Sep 3, 2014 at 8:13 PM, 牛兆捷 nzjem...@gmail.commailto:nzjem...@gmail.com wrote: Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts, does Spark support adjusting/resizing memory size of this part dynamically? Thanks. -- *Regards,* *Zhaojie* -- Regards, Zhaojie
RE: memory size for caching RDD
I think there is no public API available to do this. In this case, the best you can do might be unpersist some RDDs manually. The problem is that this is done by RDD unit, not by block unit. And then, if the storage level including disk level, the data on the disk will be removed too. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:57 PM To: Liu, Raymond Cc: Patrick Wendell; u...@spark.apache.org; dev@spark.apache.org Subject: Re: memory size for caching RDD Oh I see. I want to implement something like this: sometimes I need to release some memory for other usage even when they are occupied by some RDDs (can be recomputed with the help of lineage when they are needed), does spark provide interfaces to force it to release some memory ? 2014-09-04 14:32 GMT+08:00 Liu, Raymond raymond@intel.com: You don’t need to. It is not static allocated to RDD cache, it is just an up limit. If you don’t use up the memory by RDD cache, it is always available for other usage. except those one also controlled by some memoryFraction conf. e.g. spark.shuffle.memoryFraction which you also set the up limit. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:27 PM To: Patrick Wendell Cc: u...@spark.apache.org; dev@spark.apache.org Subject: Re: memory size for caching RDD But is it possible to make t resizable? When we don't have many RDD to cache, we can give some memory to others. 2014-09-04 13:45 GMT+08:00 Patrick Wendell pwend...@gmail.com: Changing this is not supported, it si immutable similar to other spark configuration settings. On Wed, Sep 3, 2014 at 8:13 PM, 牛兆捷 nzjem...@gmail.com wrote: Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts, does Spark support adjusting/resizing memory size of this part dynamically? Thanks. -- *Regards,* *Zhaojie* -- Regards, Zhaojie -- Regards, Zhaojie
RE: RDDs
Actually, a replicated RDD and a parallel job on the same RDD, this two conception is not related at all. A replicated RDD just store data on multiple node, it helps with HA and provide better chance for data locality. It is still one RDD, not two separate RDD. While regarding run two jobs on the same RDD, it doesn't matter that the RDD is replicated or not. You can always do it if you wish to. Best Regards, Raymond Liu -Original Message- From: Kartheek.R [mailto:kartheek.m...@gmail.com] Sent: Thursday, September 04, 2014 1:24 PM To: u...@spark.incubator.apache.org Subject: RE: RDDs Thank you Raymond and Tobias. Yeah, I am very clear about what I was asking. I was talking about replicated rdd only. Now that I've got my understanding about job and application validated, I wanted to know if we can replicate an rdd and run two jobs (that need same rdd) of an application in parallel?. -Karthk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13416.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: memory size for caching RDD
You don’t need to. It is not static allocated to RDD cache, it is just an up limit. If you don’t use up the memory by RDD cache, it is always available for other usage. except those one also controlled by some memoryFraction conf. e.g. spark.shuffle.memoryFraction which you also set the up limit. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:27 PM To: Patrick Wendell Cc: user@spark.apache.org; d...@spark.apache.org Subject: Re: memory size for caching RDD But is it possible to make t resizable? When we don't have many RDD to cache, we can give some memory to others. 2014-09-04 13:45 GMT+08:00 Patrick Wendell pwend...@gmail.commailto:pwend...@gmail.com: Changing this is not supported, it si immutable similar to other spark configuration settings. On Wed, Sep 3, 2014 at 8:13 PM, 牛兆捷 nzjem...@gmail.commailto:nzjem...@gmail.com wrote: Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts, does Spark support adjusting/resizing memory size of this part dynamically? Thanks. -- *Regards,* *Zhaojie* -- Regards, Zhaojie
RE: memory size for caching RDD
I think there is no public API available to do this. In this case, the best you can do might be unpersist some RDDs manually. The problem is that this is done by RDD unit, not by block unit. And then, if the storage level including disk level, the data on the disk will be removed too. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:57 PM To: Liu, Raymond Cc: Patrick Wendell; user@spark.apache.org; d...@spark.apache.org Subject: Re: memory size for caching RDD Oh I see. I want to implement something like this: sometimes I need to release some memory for other usage even when they are occupied by some RDDs (can be recomputed with the help of lineage when they are needed), does spark provide interfaces to force it to release some memory ? 2014-09-04 14:32 GMT+08:00 Liu, Raymond raymond@intel.com: You don’t need to. It is not static allocated to RDD cache, it is just an up limit. If you don’t use up the memory by RDD cache, it is always available for other usage. except those one also controlled by some memoryFraction conf. e.g. spark.shuffle.memoryFraction which you also set the up limit. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:27 PM To: Patrick Wendell Cc: user@spark.apache.org; d...@spark.apache.org Subject: Re: memory size for caching RDD But is it possible to make t resizable? When we don't have many RDD to cache, we can give some memory to others. 2014-09-04 13:45 GMT+08:00 Patrick Wendell pwend...@gmail.com: Changing this is not supported, it si immutable similar to other spark configuration settings. On Wed, Sep 3, 2014 at 8:13 PM, 牛兆捷 nzjem...@gmail.com wrote: Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts, does Spark support adjusting/resizing memory size of this part dynamically? Thanks. -- *Regards,* *Zhaojie* -- Regards, Zhaojie -- Regards, Zhaojie
RE: RDDs
Not sure what did you refer to when saying replicated rdd, if you actually mean RDD, then, yes , read the API doc and paper as Tobias mentioned. If you actually focus on the word replicated, then that is for fault tolerant, and probably mostly used in the streaming case for receiver created RDD. For Spark, Application is your user program. And a job is an internal schedule conception, It's a group of some RDD operation. Your applications might invoke several jobs. Best Regards, Raymond Liu From: rapelly kartheek [mailto:kartheek.m...@gmail.com] Sent: Wednesday, September 03, 2014 5:03 PM To: user@spark.apache.org Subject: RDDs Hi, Can someone tell me what kind of operations can be performed on a replicated rdd?? What are the use-cases of a replicated rdd. One basic doubt that is bothering me from long time: what is the difference between an application and job in the Spark parlance. I am confused b'cas of Hadoop jargon. Thank you
RE: resize memory size for caching RDD
AFAIK, No. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 11:30 AM To: user@spark.apache.org Subject: resize memory size for caching RDD Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts, does Spark support adjusting/resizing memory size of this part dynamically? Thanks. -- Regards, Zhaojie - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: how to filter value in spark
You could use cogroup to combine RDDs in one RDD for cross reference processing. e.g. a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty r.nonEmpty }. map{case (k,(l,r)) = (k, l)} Best Regards, Raymond Liu -Original Message- From: marylucy [mailto:qaz163wsx_...@hotmail.com] Sent: Friday, August 29, 2014 9:26 PM To: Matthew Farrellee Cc: user@spark.apache.org Subject: Re: how to filter value in spark i see it works well,thank you!!! But in follow situation how to do var a = sc.textFile(/sparktest/1/).map((_,a)) var b = sc.textFile(/sparktest/2/).map((_,b)) How to get (3,a) and (4,a) 在 Aug 28, 2014,19:54,Matthew Farrellee m...@redhat.com 写道: On 08/28/2014 07:20 AM, marylucy wrote: fileA=1 2 3 4 one number a line,save in /sparktest/1/ fileB=3 4 5 6 one number a line,save in /sparktest/2/ I want to get 3 and 4 var a = sc.textFile(/sparktest/1/).map((_,1)) var b = sc.textFile(/sparktest/2/).map((_,1)) a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(prin tln) Error throw Scala.MatchError:Null PairRDDFunctions.lookup... the issue is nesting of the b rdd inside a transformation of the a rdd consider using intersection, it's more idiomatic a.intersection(b).foreach(println) but not that intersection will remove duplicates best, matt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org B�CB??[��X�剀�X�KK[XZ[ ?\�\�][��X�剀�X�P?\���\X?KBY][��[圹[X[??K[XZ[ ?\�\�Z[?\���\X?KB�B
RE: What is a Block Manager?
The framework have those info to manage cluster status, and these info (e.g. worker number) is also available through spark metrics system. While from the user application's point of view, can you give an example why you need these info, what would you plan to do with them? Best Regards, Raymond Liu From: Victor Tso-Guillen [mailto:v...@paxata.com] Sent: Wednesday, August 27, 2014 1:40 PM To: Liu, Raymond Cc: user@spark.apache.org Subject: Re: What is a Block Manager? We're a single-app deployment so we want to launch as many executors as the system has workers. We accomplish this by not configuring the max for the application. However, is there really no way to inspect what machines/executor ids/number of workers/etc is available in context? I'd imagine that there'd be something in the SparkContext or in the listener, but all I see in the listener is block managers getting added and removed. Wouldn't one care about the workers getting added and removed at least as much as for block managers? On Tue, Aug 26, 2014 at 6:58 PM, Liu, Raymond raymond@intel.com wrote: Basically, a Block Manager manages the storage for most of the data in spark, name a few: block that represent a cached RDD partition, intermediate shuffle data, broadcast data etc. it is per executor, while in standalone mode, normally, you have one executor per worker. You don't control how many worker you have at runtime, but you can somehow manage how many executors your application will launch Check different running mode's documentation for details ( but control where? Hardly, yarn mode did some works based on data locality, but this is done by framework not user program). Best Regards, Raymond Liu From: Victor Tso-Guillen [mailto:v...@paxata.com] Sent: Tuesday, August 26, 2014 11:42 PM To: user@spark.apache.org Subject: What is a Block Manager? I'm curious not only about what they do, but what their relationship is to the rest of the system. I find that I get listener events for n block managers added where n is also the number of workers I have available to the application. Is this a stable constant? Also, are there ways to determine at runtime how many workers I have and where they are? Thanks, Victor
RE: What is a Block Manager?
Basically, a Block Manager manages the storage for most of the data in spark, name a few: block that represent a cached RDD partition, intermediate shuffle data, broadcast data etc. it is per executor, while in standalone mode, normally, you have one executor per worker. You don't control how many worker you have at runtime, but you can somehow manage how many executors your application will launch Check different running mode's documentation for details ( but control where? Hardly, yarn mode did some works based on data locality, but this is done by framework not user program). Best Regards, Raymond Liu From: Victor Tso-Guillen [mailto:v...@paxata.com] Sent: Tuesday, August 26, 2014 11:42 PM To: user@spark.apache.org Subject: What is a Block Manager? I'm curious not only about what they do, but what their relationship is to the rest of the system. I find that I get listener events for n block managers added where n is also the number of workers I have available to the application. Is this a stable constant? Also, are there ways to determine at runtime how many workers I have and where they are? Thanks, Victor - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Request for help in writing to Textfile
You can try to manipulate the string you want to output before saveAsTextFile, something like modify. flatMap(x=x).map{x= val s=x.toString s.subSequence(1,s.length-1) } Should have more optimized way. Best Regards, Raymond Liu -Original Message- From: yh18190 [mailto:yh18...@gmail.com] Sent: Monday, August 25, 2014 9:57 PM To: u...@spark.incubator.apache.org Subject: Request for help in writing to Textfile Hi Guys, I am currently playing with huge data.I have an RDD which returns RDD[List[(tuples)]].I need only the tuples to be written to textfile output using saveAsTextFile function. example:val mod=modify.saveASTextFile() returns List((20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.40,KY1), (20140813,4,141127,3,HYPHLJLU,HY,DBLHWEB,USD,144.00,662.40,KY1)) List((20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.40,KY1), (20140813,4,141127,3,HYPHLJLU,HY,DBLHWEB,USD,144.00,662.40,KY1) I need following output with only tuple values in a textfile. 20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.40,KY1 20140813,4,141127,3,HYPHLJLU,HY,DBLHWEB,USD,144.00,662.40,KY1 Please let me know if anybody has anyidea regarding this without using collect() function...Please help me -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Request-for-help-in-writing-to-Textfile-tp12744.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
KMeansIterative in master branch ?
I could not found the examples/flink-java-examples-*-KMeansIterative.jar in the trunk code. Is that removed? If so, then the run_example_quickstart doc need to be updated too. Best Regards, Raymond Liu
RE: MIMA Compatiblity Checks
so how to run the check locally? On master tree, sbt mimaReportBinaryIssues Seems to lead to a lot of errors reported. Do we need to modify SparkBuilder.scala etc to run it locally? Could not figure out how Jekins run the check on its console outputs. Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Monday, June 09, 2014 3:40 AM To: dev@spark.apache.org Subject: MIMA Compatiblity Checks Hey All, Some people may have noticed PR failures due to binary compatibility checks. We've had these enabled in several of the sub-modules since the 0.9.0 release but we've turned them on in Spark core post 1.0.0 which has much higher churn. The checks are based on the migration manager tool from Typesafe. One issue is that tool doesn't support package-private declarations of classes or methods. Prashant Sharma has built instrumentation that adds partial support for package-privacy (via a workaround) but since there isn't really native support for this in MIMA we are still finding cases in which we trigger false positives. In the next week or two we'll make it a priority to handle more of these false-positive cases. In the mean time users can add manual excludes to: project/MimaExcludes.scala to avoid triggering warnings for certain issues. This is definitely annoying - sorry about that. Unfortunately we are the first open source Scala project to ever do this, so we are dealing with uncharted territory. Longer term I'd actually like to see us just write our own sbt-based tool to do this in a better way (we've had trouble trying to extend MIMA itself, it e.g. has copy-pasted code in it from an old version of the scala compiler). If someone in the community is a Scala fan and wants to take that on, I'm happy to give more details. - Patrick
Unable to compile both trunk and 0.5.1 code
Hi Just clone the code from incubator flink, I tried both trunk code and 0.5.1 release, and encounter the same problem. # mvn --version Apache Maven 3.0.4 Maven home: /usr/share/maven Java version: 1.6.0_30, vendor: Sun Microsystems Inc. Java home: /usr/java/jdk1.6.0_30/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 3.8.0-29-generic, arch: amd64, family: unix # mvn -DskipTests clean package [INFO] Scanning for projects... [INFO] [INFO] Reactor Build Order: [INFO] [INFO] stratosphere [INFO] stratosphere-core [INFO] stratosphere-java [INFO] stratosphere-runtime [INFO] stratosphere-compiler [INFO] stratosphere-scala [INFO] stratosphere-clients [INFO] stratosphere-examples [INFO] stratosphere-java-examples [INFO] stratosphere-scala-examples [INFO] stratosphere-test-utils [INFO] stratosphere-tests [INFO] stratosphere-addons [INFO] avro [INFO] jdbc [INFO] spargel [INFO] hadoop-compatibility [INFO] stratosphere-quickstart [INFO] quickstart-java [INFO] quickstart-scala [INFO] stratosphere-dist [INFO] [INFO] [INFO] Building stratosphere 0.5.1 [INFO] [INFO] [INFO] --- maven-clean-plugin:2.3:clean (default-clean) @ stratosphere --- [INFO] Deleting file set: /root/git/incubator-flink/target (included: [**], excluded: []) [INFO] [INFO] --- maven-checkstyle-plugin:2.12.1:check (validate) @ stratosphere --- [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ stratosphere --- [INFO] [INFO] --- maven-javadoc-plugin:2.9.1:jar (attach-javadocs) @ stratosphere --- [INFO] Not executing Javadoc as the project is not a Java classpath-capable package [INFO] [INFO] maven-source-plugin:2.2.1:jar (attach-sources) @ stratosphere [INFO] [INFO] --- maven-checkstyle-plugin:2.12.1:check (validate) @ stratosphere --- [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ stratosphere --- [INFO] [INFO] maven-source-plugin:2.2.1:jar (attach-sources) @ stratosphere [INFO] [INFO] --- maven-source-plugin:2.2.1:jar (attach-sources) @ stratosphere --- [INFO] [INFO] [INFO] Building stratosphere-core 0.5.1 [INFO] [INFO] [INFO] --- maven-clean-plugin:2.3:clean (default-clean) @ stratosphere-core --- [INFO] Deleting file set: /root/git/incubator-flink/stratosphere-core/target (included: [**], excluded: []) [INFO] [INFO] --- maven-checkstyle-plugin:2.12.1:check (validate) @ stratosphere-core --- [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ stratosphere-core --- [INFO] [INFO] --- maven-resources-plugin:2.3:resources (default-resources) @ stratosphere-core --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /root/git/incubator-flink/stratosphere-core/src/main/resources [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ stratosphere-core --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 225 source files to /root/git/incubator-flink/stratosphere-core/target/classes [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/core/memory/MemoryUtils.java:[27,37] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/core/memory/MemoryUtils.java:[36,32] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/core/memory/MemorySegment.java:[963,38] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/types/Record.java:[1733,46] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/core/memory/MemoryUtils.java:[38,53] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/core/memory/MemoryUtils.java:[40,41] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] Note: /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/api/common/Plan.java uses unchecked or unsafe operations. [WARNING] Note: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- maven-resources-plugin:2.3:testResources (default-testResources) @ stratosphere-core --- [INFO] Using 'UTF-8' encoding to copy filtered resources.
RE: Unable to compile both trunk and 0.5.1 code
Actually, tried jdk 1.7.0. It works. While the project readme said that Java 6, 7 or 8 both works... Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Monday, June 30, 2014 4:13 PM To: dev@flink.incubator.apache.org Subject: Unable to compile both trunk and 0.5.1 code Hi Just clone the code from incubator flink, I tried both trunk code and 0.5.1 release, and encounter the same problem. # mvn --version Apache Maven 3.0.4 Maven home: /usr/share/maven Java version: 1.6.0_30, vendor: Sun Microsystems Inc. Java home: /usr/java/jdk1.6.0_30/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 3.8.0-29-generic, arch: amd64, family: unix # mvn -DskipTests clean package [INFO] Scanning for projects... [INFO] [INFO] Reactor Build Order: [INFO] [INFO] stratosphere [INFO] stratosphere-core [INFO] stratosphere-java [INFO] stratosphere-runtime [INFO] stratosphere-compiler [INFO] stratosphere-scala [INFO] stratosphere-clients [INFO] stratosphere-examples [INFO] stratosphere-java-examples [INFO] stratosphere-scala-examples [INFO] stratosphere-test-utils [INFO] stratosphere-tests [INFO] stratosphere-addons [INFO] avro [INFO] jdbc [INFO] spargel [INFO] hadoop-compatibility [INFO] stratosphere-quickstart [INFO] quickstart-java [INFO] quickstart-scala [INFO] stratosphere-dist [INFO] [INFO] [INFO] Building stratosphere 0.5.1 [INFO] [INFO] [INFO] --- maven-clean-plugin:2.3:clean (default-clean) @ stratosphere --- [INFO] Deleting file set: /root/git/incubator-flink/target (included: [**], excluded: []) [INFO] [INFO] --- maven-checkstyle-plugin:2.12.1:check (validate) @ stratosphere --- [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ stratosphere --- [INFO] [INFO] --- maven-javadoc-plugin:2.9.1:jar (attach-javadocs) @ stratosphere --- [INFO] Not executing Javadoc as the project is not a Java classpath-capable package [INFO] [INFO] maven-source-plugin:2.2.1:jar (attach-sources) @ stratosphere [INFO] [INFO] --- maven-checkstyle-plugin:2.12.1:check (validate) @ stratosphere --- [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ stratosphere --- [INFO] [INFO] maven-source-plugin:2.2.1:jar (attach-sources) @ stratosphere [INFO] [INFO] --- maven-source-plugin:2.2.1:jar (attach-sources) @ stratosphere --- [INFO] [INFO] [INFO] Building stratosphere-core 0.5.1 [INFO] [INFO] [INFO] --- maven-clean-plugin:2.3:clean (default-clean) @ stratosphere-core --- [INFO] Deleting file set: /root/git/incubator-flink/stratosphere-core/target (included: [**], excluded: []) [INFO] [INFO] --- maven-checkstyle-plugin:2.12.1:check (validate) @ stratosphere-core --- [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ stratosphere-core --- [INFO] [INFO] --- maven-resources-plugin:2.3:resources (default-resources) @ stratosphere-core --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /root/git/incubator-flink/stratosphere-core/src/main/resources [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ stratosphere-core --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 225 source files to /root/git/incubator-flink/stratosphere-core/target/classes [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/core/memory/MemoryUtils.java:[27,37] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/core/memory/MemoryUtils.java:[36,32] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/core/memory/MemorySegment.java:[963,38] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/types/Record.java:[1733,46] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/core/memory/MemoryUtils.java:[38,53] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosphere/core/memory/MemoryUtils.java:[40,41] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] Note: /root/git
RE: Unable to compile both trunk and 0.5.1 code
Great, Thanks for the reply :) Best Regards, Raymond Liu -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Monday, June 30, 2014 4:26 PM To: dev@flink.incubator.apache.org Subject: Re: Unable to compile both trunk and 0.5.1 code Hi Raymond, its a known issue of the Oracle JDK 6 compiler. There is a issue filed at Oracle but they don't support JDK6 anymore, so they won't fix it. OpenJDK 6 has a fix, and all newer versions from Oracle also have fixes. Flink/Stratosphere runs on the buggy Oracle JDK 6 version, just the compiler won't work. Regards, Robert On Mon, Jun 30, 2014 at 10:20 AM, Liu, Raymond raymond@intel.com wrote: Actually, tried jdk 1.7.0. It works. While the project readme said that Java 6, 7 or 8 both works... Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Monday, June 30, 2014 4:13 PM To: dev@flink.incubator.apache.org Subject: Unable to compile both trunk and 0.5.1 code Hi Just clone the code from incubator flink, I tried both trunk code and 0.5.1 release, and encounter the same problem. # mvn --version Apache Maven 3.0.4 Maven home: /usr/share/maven Java version: 1.6.0_30, vendor: Sun Microsystems Inc. Java home: /usr/java/jdk1.6.0_30/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 3.8.0-29-generic, arch: amd64, family: unix # mvn -DskipTests clean package [INFO] Scanning for projects... [INFO] -- -- [INFO] Reactor Build Order: [INFO] [INFO] stratosphere [INFO] stratosphere-core [INFO] stratosphere-java [INFO] stratosphere-runtime [INFO] stratosphere-compiler [INFO] stratosphere-scala [INFO] stratosphere-clients [INFO] stratosphere-examples [INFO] stratosphere-java-examples [INFO] stratosphere-scala-examples [INFO] stratosphere-test-utils [INFO] stratosphere-tests [INFO] stratosphere-addons [INFO] avro [INFO] jdbc [INFO] spargel [INFO] hadoop-compatibility [INFO] stratosphere-quickstart [INFO] quickstart-java [INFO] quickstart-scala [INFO] stratosphere-dist [INFO] [INFO] -- -- [INFO] Building stratosphere 0.5.1 [INFO] -- -- [INFO] [INFO] --- maven-clean-plugin:2.3:clean (default-clean) @ stratosphere --- [INFO] Deleting file set: /root/git/incubator-flink/target (included: [**], excluded: []) [INFO] [INFO] --- maven-checkstyle-plugin:2.12.1:check (validate) @ stratosphere --- [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ stratosphere --- [INFO] [INFO] --- maven-javadoc-plugin:2.9.1:jar (attach-javadocs) @ stratosphere --- [INFO] Not executing Javadoc as the project is not a Java classpath-capable package [INFO] [INFO] maven-source-plugin:2.2.1:jar (attach-sources) @ stratosphere [INFO] [INFO] --- maven-checkstyle-plugin:2.12.1:check (validate) @ stratosphere --- [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ stratosphere --- [INFO] [INFO] maven-source-plugin:2.2.1:jar (attach-sources) @ stratosphere [INFO] [INFO] --- maven-source-plugin:2.2.1:jar (attach-sources) @ stratosphere --- [INFO] [INFO] -- -- [INFO] Building stratosphere-core 0.5.1 [INFO] -- -- [INFO] [INFO] --- maven-clean-plugin:2.3:clean (default-clean) @ stratosphere-core --- [INFO] Deleting file set: /root/git/incubator-flink/stratosphere-core/target (included: [**], excluded: []) [INFO] [INFO] --- maven-checkstyle-plugin:2.12.1:check (validate) @ stratosphere-core --- [INFO] [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ stratosphere-core --- [INFO] [INFO] --- maven-resources-plugin:2.3:resources (default-resources) @ stratosphere-core --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /root/git/incubator-flink/stratosphere-core/src/main/resources [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ stratosphere-core --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 225 source files to /root/git/incubator-flink/stratosphere-core/target/classes [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosph ere/core/memory/MemoryUtils.java:[27,37] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu/stratosph ere/core/memory/MemoryUtils.java:[36,32] warning: sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /root/git/incubator-flink/stratosphere-core/src/main/java/eu
RE: About StorageLevel
I think there is a shuffle stage involved. And the future count job will depends on the first job’s shuffle stages’s output data directly as long as it is still available. Thus it will be much faster. Best Regards, Raymond Liu From: tomsheep...@gmail.com [mailto:tomsheep...@gmail.com] Sent: Friday, June 27, 2014 10:08 AM To: user Subject: Re: About StorageLevel Thank u Andrew, that's very helpful. I still have some doubts on a simple trial: I opened a spark shell in local mode, and typed in val r=sc.parallelize(0 to 50) val r2=r.keyBy(x=x).groupByKey(10) and then I invoked the count action several times on it, r2.count (multiple times) The first job obviously takes more time than the latter ones. Is there some magic underneath? Regards, Kang Liu From: Andrew Ormailto:and...@databricks.com Date: 2014-06-27 02:25 To: usermailto:user@spark.apache.org Subject: Re: About StorageLevel Hi Kang, You raise a good point. Spark does not automatically cache all your RDDs. Why? Simply because the application may create many RDDs, and not all of them are to be reused. After all, there is only so much memory available to each executor, and caching an RDD adds some overhead especially if we have to kick out old blocks with LRU. As an example, say you run the following chain: sc.textFile(...).map(...).filter(...).flatMap(...).map(...).reduceByKey(...).count() You might be interested in reusing only the final result, but each step of the chain actually creates an RDD. If we automatically cache all RDDs, then we'll end up doing extra work for the RDDs we don't care about. The effect can be much worse if our RDDs are big and there are many of them, in which case there may be a lot of churn in the cache as we constantly evict RDDs we reuse. After all, the users know best what RDDs they are most interested in, so it makes sense to give them control over caching behavior. Best, Andrew 2014-06-26 5:36 GMT-07:00 tomsheep...@gmail.commailto:tomsheep...@gmail.com tomsheep...@gmail.commailto:tomsheep...@gmail.com: Hi all, I have a newbie question about StorageLevel of spark. I came up with these sentences in spark documents: If your RDDs fit comfortably with the default storage level (MEMORY_ONLY), leave them that way. This is the most CPU-efficient option, allowing operations on the RDDs to run as fast as possible. And Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method. http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence But I found the default storageLevel is NONE in source code, and if I never call 'persist(somelevel)', that value will always be NONE. The 'iterator' method goes to final def iterator(split: Partition, context: TaskContext): Iterator[T] = { if (storageLevel != StorageLevel.NONE) { SparkEnv.get.cacheManager.getOrCompute(this, split, context, storageLevel) } else { computeOrReadCheckpoint(split, context) } } Is that to say, the rdds are cached in memory (or somewhere else) if and only if the 'persist' or 'cache' method is called explicitly, otherwise they will be re-computed every time even in an iterative situation? It made me confused becase I had a first impression that spark is super-fast because it prefers to store intermediate results in memory automatically. Forgive me if I asked a stupid question. Regards, Kang Liu
RE: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?
If some task have no locality preference, it will also show up as PROCESS_LOCAL, yet, I think we probably need to name it NO_PREFER to make it more clear. Not sure is this your case. Best Regards, Raymond Liu From: coded...@gmail.com [mailto:coded...@gmail.com] On Behalf Of Sung Hwan Chung Sent: Friday, June 06, 2014 6:53 AM To: user@spark.apache.org Subject: Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL? Additionally, I've encountered some confusing situation where the locality level for a task showed up as 'PROCESS_LOCAL' even though I didn't cache the data. I wonder some implicit caching happens even without the user specifying things. On Thu, Jun 5, 2014 at 3:50 PM, Sung Hwan Chung coded...@cs.stanford.edumailto:coded...@cs.stanford.edu wrote: Thanks Andrew, Is there a chance that even with full-caching, that modes other than PROCESS_LOCAL will be used? E.g., let's say, an executor will try to perform tasks although the data are cached on a different executor. What I'd like to do is to prevent such a scenario entirely. I'd like to know if setting 'spark.locality.wait' to a very high value would guarantee that the mode will always be 'PROCESS_LOCAL'. On Thu, Jun 5, 2014 at 3:36 PM, Andrew Ash and...@andrewash.commailto:and...@andrewash.com wrote: The locality is how close the data is to the code that's processing it. PROCESS_LOCAL means data is in the same JVM as the code that's running, so it's really fast. NODE_LOCAL might mean that the data is in HDFS on the same node, or in another executor on the same node, so is a little slower because the data has to travel across an IPC connection. RACK_LOCAL is even slower -- data is on a different server so needs to be sent over the network. Spark switches to lower locality levels when there's no unprocessed data on a node that has idle CPUs. In that situation you have two options: wait until the busy CPUs free up so you can start another task that uses data on that server, or start a new task on a farther away server that needs to bring data from that remote place. What Spark typically does is wait a bit in the hopes that a busy CPU frees up. Once that timeout expires, it starts moving the data from far away to the free CPU. The main tunable option is how far long the scheduler waits before starting to move data rather than code. Those are the spark.locality.* settings here: http://spark.apache.org/docs/latest/configuration.html If you want to prevent this from happening entirely, you can set the values to ridiculously high numbers. The documentation also mentions that 0 has special meaning, so you can try that as well. Good luck! Andrew On Thu, Jun 5, 2014 at 3:13 PM, Sung Hwan Chung coded...@cs.stanford.edumailto:coded...@cs.stanford.edu wrote: I noticed that sometimes tasks would switch from PROCESS_LOCAL (I'd assume that this means fully cached) to NODE_LOCAL or even RACK_LOCAL. When these happen things get extremely slow. Does this mean that the executor got terminated and restarted? Is there a way to prevent this from happening (barring the machine actually going down, I'd rather stick with the same process)?
RE: yarn-client mode question
Seems you are asking that does spark related jar need to be deploy to yarn cluster manually before you launch application? Then, no , you don't, just like other yarn application. And it doesn't matter it is yarn-client or yarn-cluster mode.. Best Regards, Raymond Liu -Original Message- From: Sophia [mailto:sln-1...@163.com] Sent: Thursday, May 22, 2014 10:55 AM To: u...@spark.incubator.apache.org Subject: Re: yarn-client mode question But,I don't understand this point,is it necessary to deploy slave node of spark in the yarn node? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/yarn-client-mode-question-tp6213p6216.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
RE: different in spark on yarn mode and standalone mode
In the core, they are not quite different In standalone mode, you have spark master and spark worker who allocate driver and executors for your spark app. While in Yarn mode, Yarn resource manager and node manager do this work. When the driver and executors have been launched, the rest part of resource scheduling go through the same process, say between driver and executor through akka actor. Best Regards, Raymond Liu -Original Message- From: Sophia [mailto:sln-1...@163.com] Hey you guys, What is the different in spark on yarn mode and standalone mode about resource schedule? Wish you happy everyday. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/different-in-spark-on-yarn-mode-and-standalone-mode-tp5300.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
RE: How fast would you expect shuffle serialize to be?
I just tried to use serializer to write object directly in local mode with code: val datasize = args(1).toInt val dataset = (0 until datasize).map( i = (asmallstring, i)) val out: OutputStream = { new BufferedOutputStream(new FileOutputStream(args(2)), 1024 * 100) } val ser = SparkEnv.get.serializer.newInstance() val serOut = ser.serializeStream(out) dataset.foreach( value = serOut.writeObject(value) ) serOut.flush() serOut.close() Thus one core on one disk. When using javaserializer, throughput is 10~12MB/s, and kryo doubles. So it seems to me that when running the full path code in my previous case, 32 core with 50MB/s total throughput are reasonable? Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Later case, total throughput aggregated from all cores. Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Wednesday, April 30, 2014 1:22 PM To: user@spark.apache.org Subject: Re: How fast would you expect shuffle serialize to be? Hm - I'm still not sure if you mean 100MB/s for each task = 3200MB/s across all cores -or- 3.1MB/s for each task = 100MB/s across all cores If it's the second one, that's really slow and something is wrong. If it's the first one this in the range of what I'd expect, but I'm no expert. On Tue, Apr 29, 2014 at 10:14 PM, Liu, Raymond raymond@intel.com wrote: For all the tasks, say 32 task on total Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Is this the serialization throughput per task or the serialization throughput for all the tasks? On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond raymond@intel.com wrote: Hi I am running a WordCount program which count words from HDFS, and I noticed that the serializer part of code takes a lot of CPU time. On a 16core/32thread node, the total throughput is around 50MB/s by JavaSerializer, and if I switching to KryoSerializer, it doubles to around 100-150MB/s. ( I have 12 disks per node and files scatter across disks, so HDFS BW is not a problem) And I also notice that, in this case, the object to write is (String, Int), if I try some case with (int, int), the throughput will be 2-3x faster further. So, in my Wordcount case, the bottleneck is CPU ( cause if with shuffle compress on, the 150MB/s data bandwidth in input side, will usually lead to around 50MB/s shuffle data) This serialize BW looks somehow too low , so I am wondering, what's BW you observe in your case? Does this throughput sounds reasonable to you? If not, anything might possible need to be examined in my case? Best Regards, Raymond Liu
RE: Shuffle Spill Issue
Hi Daniel Thanks for your reply, While I think for reduceByKey, it will also do map side combine, thus extra the result is the same, say, for each partition, one entry per distinct word. In my case with javaserializer, 240MB dataset yield to around 70MB shuffle data. Only that shuffle Spill ( memory ) is abnormal, and sounds to me should not trigger at all. And, by the way, this behavior only occurs in map out side, on reduce / shuffle fetch side, this strange behavior won't happen. Best Regards, Raymond Liu From: Daniel Darabos [mailto:daniel.dara...@lynxanalytics.com] I have no idea why shuffle spill is so large. But this might make it smaller: val addition = (a: Int, b: Int) = a + b val wordsCount = wordsPair.combineByKey(identity, addition, addition) This way only one entry per distinct word will end up in the shuffle for each partition, instead of one entry per word occurrence. On Tue, Apr 29, 2014 at 7:48 AM, Liu, Raymond raymond@intel.com wrote: Hi Patrick I am just doing simple word count , the data is generated by hadoop random text writer. This seems to me not quite related to compress , If I turn off compress on shuffle, the metrics is something like below for the smaller 240MB Dataset. Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Shuffle Read Shuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 10 sr437:48527 35 s 8 0 8 0.0 B 2.5 MB 2.2 GB 1291.2 KB 12 sr437:46077 34 s 8 0 8 0.0 B 2.5 MB 1822.6 MB 1073.3 KB 13 sr434:37896 31 s 8 0 8 0.0 B 2.4 MB 1099.2 MB 621.2 KB 15 sr438:52819 31 s 8 0 8 0.0 B 2.5 MB 1898.8 MB 1072.6 KB 16 sr434:37103 32 s 8 0 8 0.0 B 2.4 MB 1638.0 MB 1044.6 KB And the program pretty simple: val files = sc.textFile(args(1)) val words = files.flatMap(_.split( )) val wordsPair = words.map(x = (x, 1)) val wordsCount = wordsPair.reduceByKey(_ + _) val count = wordsCount.count() println(Number of words = + count) Best Regards, Raymond Liu From: Patrick Wendell [mailto:pwend...@gmail.com] Could you explain more what your job is doing and what data types you are using? These numbers alone don't necessarily indicate something is wrong. The relationship between the in-memory and on-disk shuffle amount is definitely a bit strange, the data gets compressed when written to disk, but unless you have a weird dataset (E.g. all zeros) I wouldn't expect it to compress _that_ much. On Mon, Apr 28, 2014 at 1:18 AM, Liu, Raymond raymond@intel.com wrote: Hi I am running a simple word count program on spark standalone cluster. The cluster is made up of 6 node, each run 4 worker and each worker own 10G memory and 16 core thus total 96 core and 240G memory. ( well, also used to configed as 1 worker with 40G memory on each node ) I run a very small data set (2.4GB on HDFS on total) to confirm the problem here as below: As you can read from part of the task metrics as below, I noticed that the shuffle spill part of metrics indicate that there are something wrong. Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Shuffle Read Shuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 0 sr437:42139 29 s 4 0 4 0.0 B 4.3 MB 23.6 GB 4.3 MB 1 sr433:46935 1.1 min 4 0 4 0.0 B 4.2 MB 19.0 GB 3.4 MB 10 sr436:53277 26 s 4 0 4 0.0 B 4.3 MB 25.6 GB 4.6 MB 11 sr437:58872 32 s 4 0 4 0.0 B 4.3 MB 25.0 GB 4.4 MB 12 sr435:48358 27 s 4 0 4 0.0 B 4.3 MB 25.1 GB 4.4 MB You can see that the Shuffle Spill (Memory) is pretty high, almost 5000x of the actual shuffle data and Shuffle Spill (Disk), and also it seems to me that by no means that the spill should trigger, since the memory is not used up at all. To verify that I further reduce the data size to 240MB on total And here is the result: Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Shuffle Read Shuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 0 sr437:50895 15 s 4 0 4 0.0 B 703.0 KB 80.0 MB 43.2 KB 1 sr433:50207 17 s 4 0 4 0.0 B 704.7 KB 389.5 MB 90.2 KB 10 sr436:56352 16 s 4 0 4 0.0 B 700.9 KB 814.9 MB 181.6 KB 11 sr437:53099 15 s 4 0 4 0.0 B 689.7 KB 0.0 B 0.0 B 12 sr435:48318 15 s 4 0 4 0.0 B 702.1 KB 427.4 MB 90.7 KB 13 sr433:59294 17 s 4 0 4 0.0 B 704.8 KB 779.9 MB
How fast would you expect shuffle serialize to be?
Hi I am running a WordCount program which count words from HDFS, and I noticed that the serializer part of code takes a lot of CPU time. On a 16core/32thread node, the total throughput is around 50MB/s by JavaSerializer, and if I switching to KryoSerializer, it doubles to around 100-150MB/s. ( I have 12 disks per node and files scatter across disks, so HDFS BW is not a problem) And I also notice that, in this case, the object to write is (String, Int), if I try some case with (int, int), the throughput will be 2-3x faster further. So, in my Wordcount case, the bottleneck is CPU ( cause if with shuffle compress on, the 150MB/s data bandwidth in input side, will usually lead to around 50MB/s shuffle data) This serialize BW looks somehow too low , so I am wondering, what's BW you observe in your case? Does this throughput sounds reasonable to you? If not, anything might possible need to be examined in my case? Best Regards, Raymond Liu
RE: How fast would you expect shuffle serialize to be?
For all the tasks, say 32 task on total Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Is this the serialization throughput per task or the serialization throughput for all the tasks? On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond raymond@intel.com wrote: Hi I am running a WordCount program which count words from HDFS, and I noticed that the serializer part of code takes a lot of CPU time. On a 16core/32thread node, the total throughput is around 50MB/s by JavaSerializer, and if I switching to KryoSerializer, it doubles to around 100-150MB/s. ( I have 12 disks per node and files scatter across disks, so HDFS BW is not a problem) And I also notice that, in this case, the object to write is (String, Int), if I try some case with (int, int), the throughput will be 2-3x faster further. So, in my Wordcount case, the bottleneck is CPU ( cause if with shuffle compress on, the 150MB/s data bandwidth in input side, will usually lead to around 50MB/s shuffle data) This serialize BW looks somehow too low , so I am wondering, what's BW you observe in your case? Does this throughput sounds reasonable to you? If not, anything might possible need to be examined in my case? Best Regards, Raymond Liu
RE: How fast would you expect shuffle serialize to be?
By the way, to be clear, I run repartition firstly to make all data go through shuffle instead of run ReduceByKey etc directly ( which reduce the data need to be shuffle and serialized), thus say all 50MB/s data from HDFS will go to serializer. ( in fact, I also tried generate data in memory directly instead of read from HDFS, similar throughput result) Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] For all the tasks, say 32 task on total Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Is this the serialization throughput per task or the serialization throughput for all the tasks? On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond raymond@intel.com wrote: Hi I am running a WordCount program which count words from HDFS, and I noticed that the serializer part of code takes a lot of CPU time. On a 16core/32thread node, the total throughput is around 50MB/s by JavaSerializer, and if I switching to KryoSerializer, it doubles to around 100-150MB/s. ( I have 12 disks per node and files scatter across disks, so HDFS BW is not a problem) And I also notice that, in this case, the object to write is (String, Int), if I try some case with (int, int), the throughput will be 2-3x faster further. So, in my Wordcount case, the bottleneck is CPU ( cause if with shuffle compress on, the 150MB/s data bandwidth in input side, will usually lead to around 50MB/s shuffle data) This serialize BW looks somehow too low , so I am wondering, what's BW you observe in your case? Does this throughput sounds reasonable to you? If not, anything might possible need to be examined in my case? Best Regards, Raymond Liu
RE: How fast would you expect shuffle serialize to be?
Later case, total throughput aggregated from all cores. Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Wednesday, April 30, 2014 1:22 PM To: user@spark.apache.org Subject: Re: How fast would you expect shuffle serialize to be? Hm - I'm still not sure if you mean 100MB/s for each task = 3200MB/s across all cores -or- 3.1MB/s for each task = 100MB/s across all cores If it's the second one, that's really slow and something is wrong. If it's the first one this in the range of what I'd expect, but I'm no expert. On Tue, Apr 29, 2014 at 10:14 PM, Liu, Raymond raymond@intel.com wrote: For all the tasks, say 32 task on total Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Is this the serialization throughput per task or the serialization throughput for all the tasks? On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond raymond@intel.com wrote: Hi I am running a WordCount program which count words from HDFS, and I noticed that the serializer part of code takes a lot of CPU time. On a 16core/32thread node, the total throughput is around 50MB/s by JavaSerializer, and if I switching to KryoSerializer, it doubles to around 100-150MB/s. ( I have 12 disks per node and files scatter across disks, so HDFS BW is not a problem) And I also notice that, in this case, the object to write is (String, Int), if I try some case with (int, int), the throughput will be 2-3x faster further. So, in my Wordcount case, the bottleneck is CPU ( cause if with shuffle compress on, the 150MB/s data bandwidth in input side, will usually lead to around 50MB/s shuffle data) This serialize BW looks somehow too low , so I am wondering, what's BW you observe in your case? Does this throughput sounds reasonable to you? If not, anything might possible need to be examined in my case? Best Regards, Raymond Liu
Shuffle Spill Issue
Hi I am running a simple word count program on spark standalone cluster. The cluster is made up of 6 node, each run 4 worker and each worker own 10G memory and 16 core thus total 96 core and 240G memory. ( well, also used to configed as 1 worker with 40G memory on each node ) I run a very small data set (2.4GB on HDFS on total) to confirm the problem here as below: As you can read from part of the task metrics as below, I noticed that the shuffle spill part of metrics indicate that there are something wrong. Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Shuffle ReadShuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 0 sr437:42139 29 s4 0 4 0.0 B 4.3 MB 23.6 GB 4.3 MB 1 sr433:46935 1.1 min 4 0 4 0.0 B 4.2 MB 19.0 GB 3.4 MB 10 sr436:53277 26 s4 0 4 0.0 B 4.3 MB 25.6 GB 4.6 MB 11 sr437:58872 32 s4 0 4 0.0 B 4.3 MB 25.0 GB 4.4 MB 12 sr435:48358 27 s4 0 4 0.0 B 4.3 MB 25.1 GB 4.4 MB You can see that the Shuffle Spill (Memory) is pretty high, almost 5000x of the actual shuffle data and Shuffle Spill (Disk), and also it seems to me that by no means that the spill should trigger, since the memory is not used up at all. To verify that I further reduce the data size to 240MB on total And here is the result: Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Shuffle ReadShuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 0 sr437:50895 15 s4 0 4 0.0 B 703.0 KB 80.0 MB 43.2 KB 1 sr433:50207 17 s4 0 4 0.0 B 704.7 KB 389.5 MB90.2 KB 10 sr436:56352 16 s4 0 4 0.0 B 700.9 KB 814.9 MB181.6 KB 11 sr437:53099 15 s4 0 4 0.0 B 689.7 KB 0.0 B 0.0 B 12 sr435:48318 15 s4 0 4 0.0 B 702.1 KB 427.4 MB90.7 KB 13 sr433:59294 17 s4 0 4 0.0 B 704.8 KB 779.9 MB180.3 KB Nothing prevent spill from happening. Now, there seems to me that there must be something wrong with the spill trigger codes. So anyone encounter this issue? By the way, I am using latest trunk code. Best Regards, Raymond Liu
RE: Shuffle Spill Issue
Hi Patrick I am just doing simple word count , the data is generated by hadoop random text writer. This seems to me not quite related to compress , If I turn off compress on shuffle, the metrics is something like below for the smaller 240MB Dataset. Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Shuffle ReadShuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 10 sr437:48527 35 s8 0 8 0.0 B 2.5 MB 2.2 GB 1291.2 KB 12 sr437:46077 34 s8 0 8 0.0 B 2.5 MB 1822.6 MB 1073.3 KB 13 sr434:37896 31 s8 0 8 0.0 B 2.4 MB 1099.2 MB 621.2 KB 15 sr438:52819 31 s8 0 8 0.0 B 2.5 MB 1898.8 MB 1072.6 KB 16 sr434:37103 32 s8 0 8 0.0 B 2.4 MB 1638.0 MB 1044.6 KB And the program pretty simple: val files = sc.textFile(args(1)) val words = files.flatMap(_.split( )) val wordsPair = words.map(x = (x, 1)) val wordsCount = wordsPair.reduceByKey(_ + _) val count = wordsCount.count() println(Number of words = + count) Best Regards, Raymond Liu From: Patrick Wendell [mailto:pwend...@gmail.com] Could you explain more what your job is doing and what data types you are using? These numbers alone don't necessarily indicate something is wrong. The relationship between the in-memory and on-disk shuffle amount is definitely a bit strange, the data gets compressed when written to disk, but unless you have a weird dataset (E.g. all zeros) I wouldn't expect it to compress _that_ much. On Mon, Apr 28, 2014 at 1:18 AM, Liu, Raymond raymond@intel.com wrote: Hi I am running a simple word count program on spark standalone cluster. The cluster is made up of 6 node, each run 4 worker and each worker own 10G memory and 16 core thus total 96 core and 240G memory. ( well, also used to configed as 1 worker with 40G memory on each node ) I run a very small data set (2.4GB on HDFS on total) to confirm the problem here as below: As you can read from part of the task metrics as below, I noticed that the shuffle spill part of metrics indicate that there are something wrong. Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Shuffle Read Shuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 0 sr437:42139 29 s 4 0 4 0.0 B 4.3 MB 23.6 GB 4.3 MB 1 sr433:46935 1.1 min 4 0 4 0.0 B 4.2 MB 19.0 GB 3.4 MB 10 sr436:53277 26 s 4 0 4 0.0 B 4.3 MB 25.6 GB 4.6 MB 11 sr437:58872 32 s 4 0 4 0.0 B 4.3 MB 25.0 GB 4.4 MB 12 sr435:48358 27 s 4 0 4 0.0 B 4.3 MB 25.1 GB 4.4 MB You can see that the Shuffle Spill (Memory) is pretty high, almost 5000x of the actual shuffle data and Shuffle Spill (Disk), and also it seems to me that by no means that the spill should trigger, since the memory is not used up at all. To verify that I further reduce the data size to 240MB on total And here is the result: Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Shuffle Read Shuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 0 sr437:50895 15 s 4 0 4 0.0 B 703.0 KB 80.0 MB 43.2 KB 1 sr433:50207 17 s 4 0 4 0.0 B 704.7 KB 389.5 MB 90.2 KB 10 sr436:56352 16 s 4 0 4 0.0 B 700.9 KB 814.9 MB 181.6 KB 11 sr437:53099 15 s 4 0 4 0.0 B 689.7 KB 0.0 B 0.0 B 12 sr435:48318 15 s 4 0 4 0.0 B 702.1 KB 427.4 MB 90.7 KB 13 sr433:59294 17 s 4 0 4 0.0 B 704.8 KB 779.9 MB 180.3 KB Nothing prevent spill from happening. Now, there seems to me that there must be something wrong with the spill trigger codes. So anyone encounter this issue? By the way, I am using latest trunk code. Best Regards, Raymond Liu
RE: Does yarn-stable still accept pull request?
Should be fixed in both alpha and stable code base, since we aim to support both version Best Regards, Raymond Liu -Original Message- From: Nan Zhu [mailto:zhunanmcg...@gmail.com] Sent: Wednesday, February 12, 2014 10:29 AM To: dev@spark.incubator.apache.org Subject: Does yarn-stable still accept pull request? Hi, all I’m a new user of spark-yarn I would like to create a pull request for an issue found in my usage, where should I modify the code, stable or alpha (the problem exists in both)? Best, -- Nan Zhu
RE: Spark Master on Hadoop Job Tracker?
Not sure what did you aim to solve. When you mention Spark Master, I guess you probably mean spark standalone mode? In that case spark cluster does not necessary coupled with hadoop cluster. While if you aim to achieve better data locality , then yes, run spark worker on HDFS data node might help. And for spark Master, I think that doesn't matter much. Best Regards, Raymond Liu -Original Message- From: mharwida [mailto:majdharw...@yahoo.com] Sent: Tuesday, January 21, 2014 2:14 AM To: user@spark.incubator.apache.org Subject: Spark Master on Hadoop Job Tracker? Hi, Should the Spark Master run on the Hadoop Job Tracker node (and Spark workers on Task Trackers) or the placement of the Spark Master could reside on any Hadoop node? Thanks Majd -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Master-on-Hadoop-Job-Tracker-tp680.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
RE: Anyone know hot to submit spark job to yarn in java code?
Hi Regarding your question 1) when I run the above script, which jar is beed submitted to the yarn server ? What SPARK_JAR env point to and the --jar point to are both submitted to the yarn server 2) It like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right? The spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar will also go to yarn cluster as runtime for app jar(spark-examples-assembly-0.8.1-incubating.jar) 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation. You can use the yarn-client mode, you might want to take a look on docs/running-on-yarn.md, and probably you might want to try master branch to check our latest update on this part of docs. And in yarn client mode, the sparkcontext itself will do similar thing as what the command line is doing to submit a yarn job Then to use it with java, you might want to try out JavaSparkContext instead of SparkContext, I don't personally run it with complicated applications. But a small example app did works. Best Regards, Raymond Liu -Original Message- From: John Zhao [mailto:jz...@alpinenow.com] Sent: Thursday, January 16, 2014 2:25 AM To: user@spark.incubator.apache.org Subject: Anyone know hot to submit spark job to yarn in java code? Now I am working on a web application and I want to submit a spark job to hadoop yarn. I have already do my own assemble and can run it in command line by the following script: export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar ./spark-class org.apache.spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1 It works fine. The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything . So my question is : 1) when I run the above script, which jar is beed submitted to the yarn server ? 2) It loos like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right? 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation. Thanks. John.
RE: yarn, fat-jars and lib_managed
I think you could put the spark jar and other jar your app depends on while not changes a lot on HDFS, and use --files or --addjars ( depends on the mode you run YarnClient/YarnStandalone ) to refer to them. And then just need to redeploy your thin app jar on each invoke. Best Regards, Raymond Liu -Original Message- From: Alex Cozzi [mailto:alexco...@gmail.com] Sent: Friday, January 10, 2014 5:32 AM To: dev@spark.incubator.apache.org Subject: yarn, fat-jars and lib_managed I am just starting out playing with spark on our hadoop 2.2 cluster and I have a question. The current way to submit jobs to the cluster is to create fat-jars with sbt assembly. This approach works but I think is less than optimal in many large hadoop installation: the way we interact with the cluster is to log into a CLI machine, which is the only authorized to submit jobs. Now, I can not use the CLI machine as a dev environment since for security reason the CLI and hadoop cluster is fire-walled and can not reach out to the internet, so sbt and manven resolution does not work. So the procedure now is: - hack code - sbt assembly - rsync my spark directory to the CLI machine - run my job. the issue is that every time i need to shuttle large binary files (all the fat-jars) back and forth, they are about 120Mb now, which is slow, particularly when I am working remotely from home. I was wondering whether a better solution would be to create normal thin-jars of my code, which is very small, less than a Mb, and have no problem to copy every time to the cluster, but to take advantage of the sbt-create directory lib_managed to handle dependencies. We already have this directory that sbt handles with all the needed dependencies for the job to run. Wouldn't be possible to have the Spark Yarn Client take care of adding all the jars in lib_managed to class path and distribute them to the workers automatically (and they could also be cached across invocations of spark, after all those jars are versioned and immutable, with the possible exception of -SNAPSHOT releases). I think that this would greatly simplify the development procedure and remove the need of messing with ADD_JAR and SPARK_CLASSPATH. What do you think? Alex
RE: Spark on Yarn classpath problems
Not found in which part of code? If in sparkContext thread, say on AM, --addJars should work If on tasks, then --addjars won't work, you need to use --file=local://xxx etc, not sure is it available in 0.8.1. And adding to a single jar should also work, if not works, might be something wrong with the assemble? Best Regards, Raymond Liu From: Eric Kimbrel [mailto:lekimb...@gmail.com] Sent: Wednesday, January 08, 2014 11:16 AM To: user@spark.incubator.apache.org Subject: Spark on Yarn classpath problems I am trying to run spark version 0.8.1 on hadoop 2.2.0-cdh5.0.0-beta-1 with YARN. I am using YARN Client with yarn-standalone mode as described here http://spark.incubator.apache.org/docs/latest/running-on-yarn.html For simplifying matters I'll say my application code is all contained in application.jar and it additionally depends on on code in dependency.jar I launch my spark application as follows: SPARK_JAR=SPARK_ASSEMBLY_JAR_FILE ./spark-class org.apache.spark.deploy.yarn.Client \ --jar application.jar \ --class My main class \ --args app specific arguments \ --num-workers NUMBER_OF_WORKER_MACHINES \ --master-memory MEMORY_FOR_MASTER \ --worker-memory MEMORY_PER_WORKER \ --worker-cores CORES_PER_WORKER \ --name application_name \ --addJars dependency.jar Yarn loads the job and starts to execute, but as the job runs it quickly dies on class not found exceptions for classes that are specified in dependency.jar. As an attempted fix i tried including all of the dependencies into a single jar application-with-dependencies.jar I specify this jar with -jar option and remove the -addJars line. Unfortunately this did not alleviate the issue and the class not found exceptions continued.
RE: compiling against hadoop 2.2
I think you also need to set yarn.version Say something like mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package hadoop.version is default to 2.2.0 while yarn.version not when you chose the new-yarn profile. We probably need to fix it later for easy usage. Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, January 03, 2014 1:07 PM To: dev@spark.incubator.apache.org Subject: compiling against hadoop 2.2 Hi, I used the following command to compile against hadoop 2.2: mvn clean package -DskipTests -Pnew-yarn But I got a lot of compilation errors. Did I use the wrong command ? Cheers
RE: compiling against hadoop 2.2
Sorry , mvn -Pnew-yarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package The one in previous mail not yet available. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond Sent: Friday, January 03, 2014 2:09 PM To: dev@spark.incubator.apache.org Subject: RE: compiling against hadoop 2.2 I think you also need to set yarn.version Say something like mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package hadoop.version is default to 2.2.0 while yarn.version not when you chose the new-yarn profile. We probably need to fix it later for easy usage. Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, January 03, 2014 1:07 PM To: dev@spark.incubator.apache.org Subject: compiling against hadoop 2.2 Hi, I used the following command to compile against hadoop 2.2: mvn clean package -DskipTests -Pnew-yarn But I got a lot of compilation errors. Did I use the wrong command ? Cheers
RE: compiling against hadoop 2.2
Yep, you are right. While we will merge in new code pretty soon ( maybe today? I hope so) on this part. Might shift a few lines Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, January 03, 2014 2:21 PM To: dev@spark.incubator.apache.org Subject: Re: compiling against hadoop 2.2 Specification of yarn.version can be inserted following this line (#762 in pom.xml), right ? hadoop.version2.2.0/hadoop.version On Thu, Jan 2, 2014 at 10:10 PM, Liu, Raymond raymond@intel.com wrote: Sorry , mvn -Pnew-yarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package The one in previous mail not yet available. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond Sent: Friday, January 03, 2014 2:09 PM To: dev@spark.incubator.apache.org Subject: RE: compiling against hadoop 2.2 I think you also need to set yarn.version Say something like mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package hadoop.version is default to 2.2.0 while yarn.version not when you chose the new-yarn profile. We probably need to fix it later for easy usage. Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, January 03, 2014 1:07 PM To: dev@spark.incubator.apache.org Subject: compiling against hadoop 2.2 Hi, I used the following command to compile against hadoop 2.2: mvn clean package -DskipTests -Pnew-yarn But I got a lot of compilation errors. Did I use the wrong command ? Cheers
RE: compiling against hadoop 2.2
And I am not sure where it is value able to providing different setting for hadoop/hdfs and yarn version. When build with SBT, they will always be the same. Maybe in mvn we should do so too. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond Sent: Friday, January 03, 2014 2:29 PM To: dev@spark.incubator.apache.org Subject: RE: compiling against hadoop 2.2 Yep, you are right. While we will merge in new code pretty soon ( maybe today? I hope so) on this part. Might shift a few lines Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, January 03, 2014 2:21 PM To: dev@spark.incubator.apache.org Subject: Re: compiling against hadoop 2.2 Specification of yarn.version can be inserted following this line (#762 in pom.xml), right ? hadoop.version2.2.0/hadoop.version On Thu, Jan 2, 2014 at 10:10 PM, Liu, Raymond raymond@intel.com wrote: Sorry , mvn -Pnew-yarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package The one in previous mail not yet available. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond Sent: Friday, January 03, 2014 2:09 PM To: dev@spark.incubator.apache.org Subject: RE: compiling against hadoop 2.2 I think you also need to set yarn.version Say something like mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package hadoop.version is default to 2.2.0 while yarn.version not when you chose the new-yarn profile. We probably need to fix it later for easy usage. Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, January 03, 2014 1:07 PM To: dev@spark.incubator.apache.org Subject: compiling against hadoop 2.2 Hi, I used the following command to compile against hadoop 2.2: mvn clean package -DskipTests -Pnew-yarn But I got a lot of compilation errors. Did I use the wrong command ? Cheers
RE: Errors with spark-0.8.1 hadoop-yarn 2.2.0
Hi Izhar Is that the exact command you are running? Say with 0.8.0 instead of 0.8.1 in the cmd? Raymond Liu From: Izhar ul Hassan [mailto:ezh...@gmail.com] Sent: Friday, December 27, 2013 9:40 PM To: user@spark.incubator.apache.org Subject: Errors with spark-0.8.1 hadoop-yarn 2.2.0 Hi, I have a 3 node installation of hadoop 2.2.0 with yarn. I have installed spark-0.8.1 with support for spark enabled. I get the following errors when trying to run the examples: SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop2.0.5-alpha.jar \ ./spark-class org.apache.spark.deploy.yarn.Client \ --jar examples/target/scala-2.9.3/spark-examples-assembly-0.8.0-incubating.jar \ --class org.apache.spark.examples.SparkPi \ --args yarn-standalone \ --num-workers 3 \ --master-memory 4g \ --worker-memory 2g \ --worker-cores 1 Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/Client Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.deploy.yarn.Client. Program will exit. spark-0.8.0 with hadooop 2.0.5-alpha works fine. -- /Izhar
RE: Unable to load additional JARs in yarn-client mode
Ido, when you say add external JARS, do you mean by -addJars which adding some jar for SparkContext to use in the AM env? If so, I think you don't need it for yarn-cilent mode at all, for yarn-client mode, SparkContext running locally, I think you just need to make sure those jars are in the java classpath. And for those need by executors / tasks, I think , you can package it as Matei said. Or maybe we can expose some env for yarn-client mode to allowing adding multiple jars as needed. Best Regards, Raymond Liu From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: Tuesday, December 24, 2013 1:17 PM To: user@spark.incubator.apache.org Subject: Re: Unable to load additional JARs in yarn-client mode I'm surprised by this, but one way that will definitely work is to assemble your application into a single JAR. If passing them to the constructor doesn't work, that's probably a bug. Matei On Dec 23, 2013, at 12:03 PM, Karavany, Ido ido.karav...@intel.commailto:ido.karav...@intel.com wrote: Hi All, For our application we need to use the yarn-client mode featured in 0.8.1. (Yarn 2.0.5) We've successfully executed it both yarn-client and yarn-standalone with our java applications. While in yarn-standalone there is a way to add external JARs - we couldn't find a way to add those in yarn-client. Adding jars in spark context constructor or setting the SPARK_CLASSPATH didn't work as well. Are we missing something? Can you please advise? If it is currently impossible - can you advise a patch / workaround? It is crucial for us to get it working with external dependencies. Many Thanks, Ido - Intel Electronics Ltd. This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
RE: About spark.driver.host
It's what it said on the document. For yarn-standalone mode, it will be the host of where spark AM runs, while for yarn-client mode, it will be the local host you run the cmd. And what's cmd you run SparkPi ? I think you actually don't need to set sprak.driver.host manually for Yarn mode , SparkContext will handle it for you in Automatically and pass it to AM and Executor to use to connect to Driver. Did you follow the guide in docs/running-on-yarn.md ? Best Regards, Raymond Liu From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Tuesday, December 17, 2013 11:16 AM To: user@spark.incubator.apache.org Subject: About spark.driver.host Hi, I am using spark-0,8,1, and what's the meaning of spark.driver.host? I ran SparkPi failed.(either yarn-standalone or yarn-client) It was 'Hostname or IP address for the driver to listen on.' in the document. but what host the Driver will listen on? the RM on the yarn? if yes, I configured spark.driver.host in the spark-env.sh as resource manager host and port: export SPARK_DAEMON_JAVA_OPTS=-Dspark.driver.host=10.2.8.1 -Dspark.driver.port=8032 but it doesn't work. I find in the log: WARN yarn.ApplicationMaster: Failed to connect to driver at null:null, retrying ... Even if I added these two system env variables to the JAVA_OPTS in the bin/spark-class, it also doen't work, please help. Any inputs are appreciated.
RE: About spark.driver.host
Hmm, I don't see what mode you are trying to use? You specify the MASTER in conf file? I think in the run-on-yarn doc, the example for yarn standalone mode mentioned that you also need to pass in -args=yarn-standalone for Client etc. And if using yarn-client mode, you don't need to invoke Client by yourself, instead use something like: SPARK_JAR=xxx SPARK_YARN_APP_JAR=xxx ./run-example org.apache.spark.examples.SparkPi yarn-client Best Regards, Raymond Liu From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Tuesday, December 17, 2013 12:43 PM To: user@spark.incubator.apache.org Subject: Re: About spark.driver.host Raymond: Add addtional: Yes, I build Spark-0.8.1 with -Pnew-yarn, and I followed run-on-yarn.cmd strictly. Spark web UI shows good for everything. On Tue, Dec 17, 2013 at 12:36 PM, Azuryy Yu azury...@gmail.commailto:azury...@gmail.com wrote: Thanks, Raymond! My command for Yarn mode: SPARK_JAR=spark-0.8.1/lib/spark-assembly_2.9.3-0.8.1-incubating-hadoop1.2.1.jar ./spark-0.8.1/bin/spark-class org.apache.spark.deploy.yarn.Client --jar spark-0.8.1/spark-examples_2.9.3-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi please ingnore hadoop version, it's our customized, which is hadoop-2x actually. but if I don't set spark.driver.*, App Master cannot start, here is the log: 13/12/17 11:07:13 INFO yarn.ApplicationMaster: Starting the user JAR in a separate Thread 13/12/17 11:07:13 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable. 13/12/17 11:07:13 WARN yarn.ApplicationMaster: Failed to connect to driver at null:null, retrying ... Usage: SparkPi master [slices] 13/12/17 11:07:13 WARN yarn.ApplicationMaster: Failed to connect to driver at null:null, retrying ... 13/12/17 11:07:13 INFO yarn.ApplicationMaster: AppMaster received a signal. 13/12/17 11:07:13 WARN yarn.ApplicationMaster: Failed to connect to driver at null:null, retrying ... After retry 'spark.yarn.applicationMaster.waitTries'(default 10), Job failed. On Tue, Dec 17, 2013 at 12:07 PM, Liu, Raymond raymond@intel.commailto:raymond@intel.com wrote: It's what it said on the document. For yarn-standalone mode, it will be the host of where spark AM runs, while for yarn-client mode, it will be the local host you run the cmd. And what's cmd you run SparkPi ? I think you actually don't need to set sprak.driver.host manually for Yarn mode , SparkContext will handle it for you in Automatically and pass it to AM and Executor to use to connect to Driver. Did you follow the guide in docs/running-on-yarn.mdhttp://running-on-yarn.md ? Best Regards, Raymond Liu From: Azuryy Yu [mailto:azury...@gmail.commailto:azury...@gmail.com] Sent: Tuesday, December 17, 2013 11:16 AM To: user@spark.incubator.apache.orgmailto:user@spark.incubator.apache.org Subject: About spark.driver.host Hi, I am using spark-0,8,1, and what's the meaning of spark.driver.host? I ran SparkPi failed.(either yarn-standalone or yarn-client) It was 'Hostname or IP address for the driver to listen on.' in the document. but what host the Driver will listen on? the RM on the yarn? if yes, I configured spark.driver.host in the spark-env.sh as resource manager host and port: export SPARK_DAEMON_JAVA_OPTS=-Dspark.driver.host=10.2.8.1 -Dspark.driver.port=8032 but it doesn't work. I find in the log: WARN yarn.ApplicationMaster: Failed to connect to driver at null:null, retrying ... Even if I added these two system env variables to the JAVA_OPTS in the bin/spark-class, it also doen't work, please help. Any inputs are appreciated.
RE: About spark.driver.host
No, the name is origin from the standard standalone mode and add a yarn prefix to distinguish it I think. But it do run on yarn cluster. About the way they run and difference of yarn-standalone mode and yarn-client mode, the doc also have the details, in short, yarn-standalone have spark-context(thus Driver) run together with AM on yarn, yarn-client have spark-context run on local machine where you launch the cmd. For both mode, the executor all run on yarn cluster. Best Regards, Raymond Liu From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Tuesday, December 17, 2013 1:19 PM To: user@spark.incubator.apache.org Subject: Re: About spark.driver.host Hi raymond, I specified Master and Slaves in the conf. As for yarn-standalone and yarn-client, I have some confusion: If I am use yarn-standalone, does that mean, It's not run on yarn cluster, only pseudo-http://dict.cn/pseudo-distributed? On Tue, Dec 17, 2013 at 1:03 PM, Liu, Raymond raymond@intel.commailto:raymond@intel.com wrote: Hmm, I don't see what mode you are trying to use? You specify the MASTER in conf file? I think in the run-on-yarn doc, the example for yarn standalone mode mentioned that you also need to pass in -args=yarn-standalone for Client etc. And if using yarn-client mode, you don't need to invoke Client by yourself, instead use something like: SPARK_JAR=xxx SPARK_YARN_APP_JAR=xxx ./run-example org.apache.spark.examples.SparkPi yarn-client Best Regards, Raymond Liu From: Azuryy Yu [mailto:azury...@gmail.commailto:azury...@gmail.com] Sent: Tuesday, December 17, 2013 12:43 PM To: user@spark.incubator.apache.orgmailto:user@spark.incubator.apache.org Subject: Re: About spark.driver.host Raymond: Add addtional: Yes, I build Spark-0.8.1 with -Pnew-yarn, and I followed run-on-yarn.cmd strictly. Spark web UI shows good for everything. On Tue, Dec 17, 2013 at 12:36 PM, Azuryy Yu azury...@gmail.commailto:azury...@gmail.com wrote: Thanks, Raymond! My command for Yarn mode: SPARK_JAR=spark-0.8.1/lib/spark-assembly_2.9.3-0.8.1-incubating-hadoop1.2.1.jar ./spark-0.8.1/bin/spark-class org.apache.spark.deploy.yarn.Client --jar spark-0.8.1/spark-examples_2.9.3-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi please ingnore hadoop version, it's our customized, which is hadoop-2x actually. but if I don't set spark.driver.*, App Master cannot start, here is the log: 13/12/17 11:07:13 INFO yarn.ApplicationMaster: Starting the user JAR in a separate Thread 13/12/17 11:07:13 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable. 13/12/17 11:07:13 WARN yarn.ApplicationMaster: Failed to connect to driver at null:null, retrying ... Usage: SparkPi master [slices] 13/12/17 11:07:13 WARN yarn.ApplicationMaster: Failed to connect to driver at null:null, retrying ... 13/12/17 11:07:13 INFO yarn.ApplicationMaster: AppMaster received a signal. 13/12/17 11:07:13 WARN yarn.ApplicationMaster: Failed to connect to driver at null:null, retrying ... After retry 'spark.yarn.applicationMaster.waitTries'(default 10), Job failed. On Tue, Dec 17, 2013 at 12:07 PM, Liu, Raymond raymond@intel.commailto:raymond@intel.com wrote: It's what it said on the document. For yarn-standalone mode, it will be the host of where spark AM runs, while for yarn-client mode, it will be the local host you run the cmd. And what's cmd you run SparkPi ? I think you actually don't need to set sprak.driver.host manually for Yarn mode , SparkContext will handle it for you in Automatically and pass it to AM and Executor to use to connect to Driver. Did you follow the guide in docs/running-on-yarn.mdhttp://running-on-yarn.md ? Best Regards, Raymond Liu From: Azuryy Yu [mailto:azury...@gmail.commailto:azury...@gmail.com] Sent: Tuesday, December 17, 2013 11:16 AM To: user@spark.incubator.apache.orgmailto:user@spark.incubator.apache.org Subject: About spark.driver.host Hi, I am using spark-0,8,1, and what's the meaning of spark.driver.host? I ran SparkPi failed.(either yarn-standalone or yarn-client) It was 'Hostname or IP address for the driver to listen on.' in the document. but what host the Driver will listen on? the RM on the yarn? if yes, I configured spark.driver.host in the spark-env.sh as resource manager host and port: export SPARK_DAEMON_JAVA_OPTS=-Dspark.driver.host=10.2.8.1 -Dspark.driver.port=8032 but it doesn't work. I find in the log: WARN yarn.ApplicationMaster: Failed to connect to driver at null:null, retrying ... Even if I added these two system env variables to the JAVA_OPTS in the bin/spark-class, it also doen't work, please help. Any inputs are appreciated.
RE: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)
Hi Azuryy Please Check https://spark-project.atlassian.net/browse/SPARK-995 for this protobuf version issue Best Regards, Raymond Liu -Original Message- From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Monday, December 16, 2013 10:30 AM To: dev@spark.incubator.apache.org Subject: Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4) Hi here, Do we have plan to upgrade protobuf from 2.4.1 to 2.5.0? PB has some uncompatable API between these two versions. Hadoop-2.x using protobuf-2.5.0 but if some guys want to run Spark on mesos, then mesos using protobuf-2.4.1 currently. so we may discuss here for a better solution. On Mon, Dec 16, 2013 at 7:42 AM, Azuryy Yu azury...@gmail.com wrote: Thanks Patrick. On 16 Dec 2013 02:43, Patrick Wendell pwend...@gmail.com wrote: You can checkout the docs mentioned in the vote thread. There is also a pre-build binary for hadoop2 that is compiled for YARN 2.2 - Patrick On Sun, Dec 15, 2013 at 4:31 AM, Azuryy Yu azury...@gmail.com wrote: yarn 2.2, not yarn 0.22, I am so sorry. On Sun, Dec 15, 2013 at 8:31 PM, Azuryy Yu azury...@gmail.com wrote: Hi, Spark-0.8.1 supports yarn 0.22 right? where to find the release note? Thanks. On Sun, Dec 15, 2013 at 3:20 AM, Henry Saputra henry.sapu...@gmail.comwrote: Yeah seems like it. He was ok with our prev release. Let's wait for his reply On Saturday, December 14, 2013, Patrick Wendell wrote: Henry - from that thread it looks like sebb's concern was something different than this. On Sat, Dec 14, 2013 at 11:08 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, Yeap I agree, but technically ASF VOTE release on source only, there even debate about it =), so putting it in the vote staging artifact could confuse people because in our case we do package 3rd party libraries in the binary jars. I have sent email to sebb asking clarification about his concern in general@ list. - Henry On Sat, Dec 14, 2013 at 10:56 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Henry, One thing a lot of people do during the vote is test the binaries and make sure they work. This is really valuable. If you'd like I could add a caveat to the vote thread explaining that we are only voting on the source. - Patrick On Sat, Dec 14, 2013 at 10:40 AM, Henry Saputra henry.sapu...@gmail.com wrote: Actually we should be fine putting the binaries there as long as the VOTE is for the source. Let's verify with sebb in the general@ list about his concern. - Henry On Sat, Dec 14, 2013 at 10:31 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, as sebb has mentioned let's move the binaries from the voting directory in your people.apache.org directory. ASF release voting is for source code and not binaries, and technically we provide binaries for convenience. And add link to the KEYS location in the dist[1] to let verify signatures. Sorry for the late response to the VOTE thread, guys. - Henry [1] https://dist.apache.org/repos/dist/release/incubator/spark/KEYS On Fri, Dec 13, 2013 at 6:37 PM, Patrick Wendell pwend...@gmail.com wrote: The vote is now closed. This vote passes with 5 PPMC +1's and no 0 or -1 votes. +1 (5 Total) Matei Zaharia* Nick Pentreath* Patrick Wendell* Prashant Sharma* Tom Graves* 0 (0 Total) -1 (0 Total) * = Binding Vote As per the incubator release guide [1] I'll be sending this to the general incubator list for a final vote from IPMC members. [1] http://incubator.apache.org/guides/releasemanagement.html#best-practi ce-incubator-release- vote On Thu, Dec 12, 2013 at 8:59 AM, Evan Chan e...@ooyala.com wrote: I'd be personally fine with a standard workflow of assemble-deps + packaging just the Spark files as separate packages, if it speeds up everyone's development time. On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra m...@clearstorydata.com wrote: I don't know how to make sense of the numbers, but here's what I've got from a very small sample size.
RE: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)
That issue is for 0.9's solution. And if you mean for 0.8.1, when you build against hadoop 2.2 Yarn, protobuf is already using 2.5.0 instead of 2.4.1. so it will works fine with hadoop 2.2 And regarding on 0.8.1 you build against hadoop 2.2 Yarn, while run upon mesos... strange combination, I am not sure, might have problem. If have problem, you might need to build mesos against 2.5.0, I don't test that, if you got time, mind take a test? Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Monday, December 16, 2013 10:48 AM To: dev@spark.incubator.apache.org Subject: RE: [VOTE] Release Apache Spark 0.8.1-incubating (rc4) Hi Azuryy Please Check https://spark-project.atlassian.net/browse/SPARK-995 for this protobuf version issue Best Regards, Raymond Liu -Original Message- From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Monday, December 16, 2013 10:30 AM To: dev@spark.incubator.apache.org Subject: Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4) Hi here, Do we have plan to upgrade protobuf from 2.4.1 to 2.5.0? PB has some uncompatable API between these two versions. Hadoop-2.x using protobuf-2.5.0 but if some guys want to run Spark on mesos, then mesos using protobuf-2.4.1 currently. so we may discuss here for a better solution. On Mon, Dec 16, 2013 at 7:42 AM, Azuryy Yu azury...@gmail.com wrote: Thanks Patrick. On 16 Dec 2013 02:43, Patrick Wendell pwend...@gmail.com wrote: You can checkout the docs mentioned in the vote thread. There is also a pre-build binary for hadoop2 that is compiled for YARN 2.2 - Patrick On Sun, Dec 15, 2013 at 4:31 AM, Azuryy Yu azury...@gmail.com wrote: yarn 2.2, not yarn 0.22, I am so sorry. On Sun, Dec 15, 2013 at 8:31 PM, Azuryy Yu azury...@gmail.com wrote: Hi, Spark-0.8.1 supports yarn 0.22 right? where to find the release note? Thanks. On Sun, Dec 15, 2013 at 3:20 AM, Henry Saputra henry.sapu...@gmail.comwrote: Yeah seems like it. He was ok with our prev release. Let's wait for his reply On Saturday, December 14, 2013, Patrick Wendell wrote: Henry - from that thread it looks like sebb's concern was something different than this. On Sat, Dec 14, 2013 at 11:08 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, Yeap I agree, but technically ASF VOTE release on source only, there even debate about it =), so putting it in the vote staging artifact could confuse people because in our case we do package 3rd party libraries in the binary jars. I have sent email to sebb asking clarification about his concern in general@ list. - Henry On Sat, Dec 14, 2013 at 10:56 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Henry, One thing a lot of people do during the vote is test the binaries and make sure they work. This is really valuable. If you'd like I could add a caveat to the vote thread explaining that we are only voting on the source. - Patrick On Sat, Dec 14, 2013 at 10:40 AM, Henry Saputra henry.sapu...@gmail.com wrote: Actually we should be fine putting the binaries there as long as the VOTE is for the source. Let's verify with sebb in the general@ list about his concern. - Henry On Sat, Dec 14, 2013 at 10:31 AM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Patrick, as sebb has mentioned let's move the binaries from the voting directory in your people.apache.org directory. ASF release voting is for source code and not binaries, and technically we provide binaries for convenience. And add link to the KEYS location in the dist[1] to let verify signatures. Sorry for the late response to the VOTE thread, guys. - Henry [1] https://dist.apache.org/repos/dist/release/incubator/spark/KEYS On Fri, Dec 13, 2013 at 6:37 PM, Patrick Wendell pwend...@gmail.com wrote: The vote is now closed. This vote passes with 5 PPMC +1's and no 0 or -1 votes. +1 (5 Total) Matei Zaharia* Nick Pentreath* Patrick Wendell* Prashant Sharma* Tom Graves* 0 (0 Total) -1 (0 Total) * = Binding Vote As per the incubator release guide [1] I'll be sending this to the general incubator list for a final vote from IPMC members. [1] http://incubator.apache.org/guides/releasemanagement.html#best-practi ce-incubator-release- vote On Thu, Dec 12, 2013 at 8:59 AM, Evan Chan e...@ooyala.com wrote: I'd be personally fine with a standard workflow of assemble-deps + packaging just the Spark files as separate packages, if it speeds up everyone's development time. On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra m...@clearstorydata.com wrote: I
RE: Scala 2.10 Merge
Hi Patrick What does that means for drop YARN 2.2? seems codes are still there. You mean if build upon 2.2 it will break, and won't and work right? Since the home made akka build on scala 2.10 are not there. While, if for this case, can we just use akka 2.3-M1 which run on protobuf 2.5 for replacement? Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 12, 2013 4:21 PM To: dev@spark.incubator.apache.org Subject: Scala 2.10 Merge Hi Developers, In the next few days we are planning to merge Scala 2.10 support into Spark. For those that haven't been following this, Prashant Sharma has been maintaining the scala-2.10 branch of Spark for several months. This branch is current with master and has been reviewed for merging: https://github.com/apache/incubator-spark/tree/scala-2.10 Scala 2.10 support is one of the most requested features for Spark - it will be great to get this into Spark 0.9! Please note that *Scala 2.10 is not binary compatible with Scala 2.9*. With that in mind, I wanted to give a few heads-up/requests to developers: If you are developing applications on top of Spark's master branch, those will need to migrate to Scala 2.10. You may want to download and test the current scala-2.10 branch in order to make sure you will be okay as Spark developments move forward. Of course, you can always stick with the current master commit and be fine (I'll cut a tag when we do the merge in order to delineate where the version changes). Please open new threads on the dev list to report and discuss any issues. This merge will temporarily drop support for YARN 2.2 on the master branch. This is because the workaround we used was only compiled for Scala 2.9. We are going to come up with a more robust solution to YARN 2.2 support before releasing 0.9. Going forward, we will continue to make maintenance releases on branch-0.8 which will remain compatible with Scala 2.9. For those interested, the primary code changes in this merge are upgrading the akka version, changing the use of Scala 2.9's ClassManifest construct to Scala 2.10's ClassTag, and updating the spark shell to work with Scala 2.10's repl. - Patrick
RE: Scala 2.10 Merge
Hi Patrick So what's the plan for support Yarn 2.2 in 0.9? As far as I can see, if you want to support both 2.2 and 2.0 , due to protobuf version incompatible issue. You need two version of akka anyway. Akka 2.3-M1 looks like have a little bit change in API, we probably could isolate the code like what we did on yarn part API. I remember that it is mentioned that to use reflection for different API is preferred. So the purpose to use reflection is to use one release bin jar to support both version of Hadoop/Yarn on runtime, instead of build different bin jar on compile time? Then all code related to hadoop will also be built in separate modules for loading on demand? This sounds to me involve a lot of works. And you still need to have shim layer and separate code for different version API and depends on different version Akka etc. Sounds like and even strict demands versus our current approaching on master, and with dynamic class loader in addition, And the problem we are facing now are still there? Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 12, 2013 5:13 PM To: dev@spark.incubator.apache.org Subject: Re: Scala 2.10 Merge Also - the code is still there because of a recent merge that took in some newer changes... we'll be removing it for the final merge. On Thu, Dec 12, 2013 at 1:12 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Raymond, This won't work because AFAIK akka 2.3-M1 is not binary compatible with akka 2.2.3 (right?). For all of the non-yarn 2.2 versions we need to still use the older protobuf library, so we'd need to support both. I'd also be concerned about having a reference to a non-released version of akka. Akka is the source of our hardest-to-find bugs and simultaneously trying to support 2.2.3 and 2.3-M1 is a bit daunting. Of course, if you are building off of master you can maintain a fork that uses this. - Patrick On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond raymond@intel.comwrote: Hi Patrick What does that means for drop YARN 2.2? seems codes are still there. You mean if build upon 2.2 it will break, and won't and work right? Since the home made akka build on scala 2.10 are not there. While, if for this case, can we just use akka 2.3-M1 which run on protobuf 2.5 for replacement? Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 12, 2013 4:21 PM To: dev@spark.incubator.apache.org Subject: Scala 2.10 Merge Hi Developers, In the next few days we are planning to merge Scala 2.10 support into Spark. For those that haven't been following this, Prashant Sharma has been maintaining the scala-2.10 branch of Spark for several months. This branch is current with master and has been reviewed for merging: https://github.com/apache/incubator-spark/tree/scala-2.10 Scala 2.10 support is one of the most requested features for Spark - it will be great to get this into Spark 0.9! Please note that *Scala 2.10 is not binary compatible with Scala 2.9*. With that in mind, I wanted to give a few heads-up/requests to developers: If you are developing applications on top of Spark's master branch, those will need to migrate to Scala 2.10. You may want to download and test the current scala-2.10 branch in order to make sure you will be okay as Spark developments move forward. Of course, you can always stick with the current master commit and be fine (I'll cut a tag when we do the merge in order to delineate where the version changes). Please open new threads on the dev list to report and discuss any issues. This merge will temporarily drop support for YARN 2.2 on the master branch. This is because the workaround we used was only compiled for Scala 2.9. We are going to come up with a more robust solution to YARN 2.2 support before releasing 0.9. Going forward, we will continue to make maintenance releases on branch-0.8 which will remain compatible with Scala 2.9. For those interested, the primary code changes in this merge are upgrading the akka version, changing the use of Scala 2.9's ClassManifest construct to Scala 2.10's ClassTag, and updating the spark shell to work with Scala 2.10's repl. - Patrick
RE: How to resolve right dependency which enabled and built/install with profile?
Actually, I think it is possible, though probably could not be done by current Maven implementation. As long as B can know which libjar the shim A is built upon. I am saying that it is possible is because that I can achieve the goal with SBT, when sbt do publish local, it will fill the profile info ( actually not profile in sbt, but some compile time config ) into A's final pom, then when assemble B, the installed A actually do not have any profile info in A's pom. B then assemble with the right libjar which A is built upon. Best Regards, Raymond Liu -Original Message- From: Stephen Connolly [mailto:stephen.alan.conno...@gmail.com] Sent: Wednesday, December 11, 2013 6:47 PM To: Maven Users List Subject: Re: How to resolve right dependency which enabled and built/install with profile? You are looking for the impossible. You want A to work with any version of libjar You want B to use A but not know which version of libjar the A it is using depends on... actually to be independent of the version of libjar that A depends on... but also to bundle the specific one... Rethink what you are asking and you will see that you are asking an impossible request... you are asking somebody to hop on one leg and mandating that the one leg be both their left and right leg. When you are asking an impossibility, that is usually a sign that you are going at things the wrong way. If A is going to become part of a fatjar at some point, you need to make a decision *at the point of creating the fatjar* as to which libjar will be included within A. Or else perhaps, you include all versions of libjar and provide a means for A to decide which version it wants... there are many ways to achieve this... * use the service provider pattern to introduce another shim layer and then use the shade plugin to put all the shim impl libjar deps into independent package namespaces * use a custom classloader and package libjar as resources, so that A's entry point loads the correct libjar at runtime * etc. With the above you would be including *all* versions of libjar within A and the whole thing becomes moot anyway... -Stephen On 11 December 2013 00:52, Liu, Raymond raymond@intel.com wrote: Thanks Stephen I see your solution is let B manage the libjar version. While this is against my wish, I wish B to know nothing about A's internal implementation. In the future, A might depends on a v3.0 libjar, I do wish to just repackage B to make it work not revise B's code or assembly rules. And sometime A might be buried deep in the dependent tree, Those project might not even aware it depends on A, they just wish it works on whatever current A's binary jar, which then need the right libjar dependency when assembly. And I know it seems hard to use profiles to manipulate dependencies. While by theory, I think this is a reasonable requirement that a project do not need to take care of its dependencies' internal implementation of what it depends on, It just wish it works. E.g. if the POM file is installed in a way that when it build with profile, the corresponding dependencies and any other modifying is fill in the final POM file. ( since the profile is not used anyway when resolve dependencies, why keep it there? For source jar? ) Then those project depend on it won't worry about A's profile, it's already the correct one which been used on building A with this installed or downloaded binary jar. So , using profile might not be the right solution, While if there isn't an automatic way to meet this requirement, can I take it as a feature missing? Best Regards, Raymond Liu -Original Message- From: Stephen Connolly [mailto:stephen.alan.conno...@gmail.com] Sent: Tuesday, December 10, 2013 6:57 PM To: Maven Users List Subject: Re: How to resolve right dependency which enabled and built/install with profile? Using profiles to manipulate dependencies is a route to madness. An modules dependencies should be a constant... It was a mistake to allow dependencies within profile. The correct solution is to create additional modules that aggregate in the corresponding lib versions (this is also a bad plan, but less worse than your current plan by quite some margin). The additional modules would depend on A and the respective version of libjar and bundle the two together. Then B can depend if the corresponding internmediate... Now at this point you will start to see the madness returning... That is if you need two versions of B. The better solution is to just let B depend in A and then do the fatjar as the final later after B and *override* the transitive dep on libjar in the two fatjar building modules On Tuesday, 10 December 2013, Liu, Raymond wrote: Hi I have a project with module A that will be built with or without profile say -Pnewlib , thus I can have it build with different version
RE: How to resolve right dependency which enabled and built/install with profile?
Thanks Stephen I see your solution is let B manage the libjar version. While this is against my wish, I wish B to know nothing about A's internal implementation. In the future, A might depends on a v3.0 libjar, I do wish to just repackage B to make it work not revise B's code or assembly rules. And sometime A might be buried deep in the dependent tree, Those project might not even aware it depends on A, they just wish it works on whatever current A's binary jar, which then need the right libjar dependency when assembly. And I know it seems hard to use profiles to manipulate dependencies. While by theory, I think this is a reasonable requirement that a project do not need to take care of its dependencies' internal implementation of what it depends on, It just wish it works. E.g. if the POM file is installed in a way that when it build with profile, the corresponding dependencies and any other modifying is fill in the final POM file. ( since the profile is not used anyway when resolve dependencies, why keep it there? For source jar? ) Then those project depend on it won't worry about A's profile, it's already the correct one which been used on building A with this installed or downloaded binary jar. So , using profile might not be the right solution, While if there isn't an automatic way to meet this requirement, can I take it as a feature missing? Best Regards, Raymond Liu -Original Message- From: Stephen Connolly [mailto:stephen.alan.conno...@gmail.com] Sent: Tuesday, December 10, 2013 6:57 PM To: Maven Users List Subject: Re: How to resolve right dependency which enabled and built/install with profile? Using profiles to manipulate dependencies is a route to madness. An modules dependencies should be a constant... It was a mistake to allow dependencies within profile. The correct solution is to create additional modules that aggregate in the corresponding lib versions (this is also a bad plan, but less worse than your current plan by quite some margin). The additional modules would depend on A and the respective version of libjar and bundle the two together. Then B can depend if the corresponding internmediate... Now at this point you will start to see the madness returning... That is if you need two versions of B. The better solution is to just let B depend in A and then do the fatjar as the final later after B and *override* the transitive dep on libjar in the two fatjar building modules On Tuesday, 10 December 2013, Liu, Raymond wrote: Hi I have a project with module A that will be built with or without profile say -Pnewlib , thus I can have it build with different version of library dependency let's say by default use libjar-1.0 and when -Pnewlib will use libjar-2.0. Then, I have a module B to depends on module A. The issue is that how can I get the right dependency in module B for libjar? I want to assemble a fat jar so I need to figure out which version of libjar to include. In my test, it seems to me that when mvn -Pnewlib install module A, though it will build with libjar-2.0. but the installed pom do not reflect this. Thus when module B resolve the dependency, it will resolve the dependency as libjar-1.0 ( though when building module B, -Pnewlib is also passed in, But I think this will not pass to the dependent package when resolve the dependency). I don't want to have module B know anything about the libjar-1.0 and libjar-2.0. it should handle by module A, and module B just simply call into module A. Actually, I think this apply the same if A and B are not modules in one project, but standalone projects. In which case B will not define profile at all. So how can I achieve this goal? say, A take care of lib dependency when build and install. B who depends on A, when assembly can retrive the right lib dependency. Best Regards, Raymond Liu - To unsubscribe, e-mail: users-unsubscr...@maven.apache.org javascript:; For additional commands, e-mail: users-h...@maven.apache.orgjavascript:; -- Sent from my phone - To unsubscribe, e-mail: users-unsubscr...@maven.apache.org For additional commands, e-mail: users-h...@maven.apache.org
RE: How to resolve right dependency which enabled and built/install with profile?
Well. The reason for using different version of libjar is that it's a client API, and working with different version of server counter part code. They are not compatible, instead, A is a shim layer that hook into different libjar API and provide a common interface for B to use. So that to hide the API changes across version. And in my case B need to be assembled into a fat jar for easy usage. So switching C in runtime won't be possible. Otherwise, not need an extra C, just switching libjar is ok. Best Regards, Raymond Liu -Original Message- From: Ron Wheeler [mailto:rwhee...@artifact-software.com] Sent: Wednesday, December 11, 2013 12:24 PM To: users@maven.apache.org Subject: Re: How to resolve right dependency which enabled and built/install with profile? B depends on A A depends on C C depends on libjar x.0 provided C has no code and is only a pom not a jar. You can then swap in any compatible version of libjar by providing it at run-time. Still not sure why you can not just use the latest version of libjar if any version will work at run-time. On 10/12/2013 7:52 PM, Liu, Raymond wrote: Thanks Stephen I see your solution is let B manage the libjar version. While this is against my wish, I wish B to know nothing about A's internal implementation. In the future, A might depends on a v3.0 libjar, I do wish to just repackage B to make it work not revise B's code or assembly rules. And sometime A might be buried deep in the dependent tree, Those project might not even aware it depends on A, they just wish it works on whatever current A's binary jar, which then need the right libjar dependency when assembly. And I know it seems hard to use profiles to manipulate dependencies. While by theory, I think this is a reasonable requirement that a project do not need to take care of its dependencies' internal implementation of what it depends on, It just wish it works. E.g. if the POM file is installed in a way that when it build with profile, the corresponding dependencies and any other modifying is fill in the final POM file. ( since the profile is not used anyway when resolve dependencies, why keep it there? For source jar? ) Then those project depend on it won't worry about A's profile, it's already the correct one which been used on building A with this installed or downloaded binary jar. So , using profile might not be the right solution, While if there isn't an automatic way to meet this requirement, can I take it as a feature missing? Best Regards, Raymond Liu -Original Message- From: Stephen Connolly [mailto:stephen.alan.conno...@gmail.com] Sent: Tuesday, December 10, 2013 6:57 PM To: Maven Users List Subject: Re: How to resolve right dependency which enabled and built/install with profile? Using profiles to manipulate dependencies is a route to madness. An modules dependencies should be a constant... It was a mistake to allow dependencies within profile. The correct solution is to create additional modules that aggregate in the corresponding lib versions (this is also a bad plan, but less worse than your current plan by quite some margin). The additional modules would depend on A and the respective version of libjar and bundle the two together. Then B can depend if the corresponding internmediate... Now at this point you will start to see the madness returning... That is if you need two versions of B. The better solution is to just let B depend in A and then do the fatjar as the final later after B and *override* the transitive dep on libjar in the two fatjar building modules On Tuesday, 10 December 2013, Liu, Raymond wrote: Hi I have a project with module A that will be built with or without profile say -Pnewlib , thus I can have it build with different version of library dependency let's say by default use libjar-1.0 and when -Pnewlib will use libjar-2.0. Then, I have a module B to depends on module A. The issue is that how can I get the right dependency in module B for libjar? I want to assemble a fat jar so I need to figure out which version of libjar to include. In my test, it seems to me that when mvn -Pnewlib install module A, though it will build with libjar-2.0. but the installed pom do not reflect this. Thus when module B resolve the dependency, it will resolve the dependency as libjar-1.0 ( though when building module B, -Pnewlib is also passed in, But I think this will not pass to the dependent package when resolve the dependency). I don't want to have module B know anything about the libjar-1.0 and libjar-2.0. it should handle by module A, and module B just simply call into module A. Actually, I think this apply the same if A and B are not modules in one project, but standalone projects. In which case B will not define profile at all. So how can I achieve this goal? say, A take care of lib dependency when build
How to resolve right dependency which enabled and built/install with profile?
Hi I have a project with module A that will be built with or without profile say -Pnewlib , thus I can have it build with different version of library dependency let's say by default use libjar-1.0 and when -Pnewlib will use libjar-2.0. Then, I have a module B to depends on module A. The issue is that how can I get the right dependency in module B for libjar? I want to assemble a fat jar so I need to figure out which version of libjar to include. In my test, it seems to me that when mvn -Pnewlib install module A, though it will build with libjar-2.0. but the installed pom do not reflect this. Thus when module B resolve the dependency, it will resolve the dependency as libjar-1.0 ( though when building module B, -Pnewlib is also passed in, But I think this will not pass to the dependent package when resolve the dependency). I don't want to have module B know anything about the libjar-1.0 and libjar-2.0. it should handle by module A, and module B just simply call into module A. Actually, I think this apply the same if A and B are not modules in one project, but standalone projects. In which case B will not define profile at all. So how can I achieve this goal? say, A take care of lib dependency when build and install. B who depends on A, when assembly can retrive the right lib dependency. Best Regards, Raymond Liu - To unsubscribe, e-mail: users-unsubscr...@maven.apache.org For additional commands, e-mail: users-h...@maven.apache.org
RE: Spark over YARN
YARN Alpha API support is already there, If you mean Yarn stable API in hadoop 2.2, it probably will be in 0.8.1 Best Regards, Raymond Liu From: Pranay Tonpay [mailto:pranay.ton...@impetus.co.in] Sent: Thursday, December 05, 2013 12:53 AM To: user@spark.incubator.apache.org Subject: Spark over YARN Hi, Is there a release where Spark over YARN targeted for ? I presume, it's in progress at the moment.. Pls correct me if my info is outdated. Thx pranay NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
RE: Worker failed to connect when build with SPARK_HADOOP_VERSION=2.2.0
What version of code you are using? 2.2.0 support not yet merged into trunk. Check out https://github.com/apache/incubator-spark/pull/199 Best Regards, Raymond Liu From: horia@gmail.com [mailto:horia@gmail.com] On Behalf Of Horia Sent: Monday, December 02, 2013 3:00 PM To: user@spark.incubator.apache.org Subject: Re: Worker failed to connect when build with SPARK_HADOOP_VERSION=2.2.0 Has this been resolved? Forgive me if I missed the follow-up but I've been having the exact same problem. - Horia On Fri, Nov 22, 2013 at 5:38 AM, Maxime Lemaire digital@gmail.com wrote: Hi all, When im building Spark with Hadoop 2.2.0 support, workers cant connect to Spark master anymore. Network is up and hostnames are correct. Tcpdump can clearly see workers trying to connect (tcpdump outputs at the end). Same set up with Spark build without SPARK_HADOOP_VERSION (or with SPARK_HADOOP_VERSION=2.0.5-alpha) is working fine ! Some details : pmtx-master01 : master pmtx-master02 : slave (behavior is the same if i launch both master and slave from the same box) Building HADOOP 2.2.0 support : fresh install on pmtx-master01 : # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly build successfull # fresh install on pmtx-master02 : # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly ...build successfull # On pmtx-master01 : # ./bin/start-master.sh starting org.apache.spark.deploy.master.Master, logging to /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out # netstat -an | grep 7077 tcp6 0 0 10.90.XX.XX:7077 :::* LISTEN # On pmtx-master02 : # nc -v pmtx-master01 7077 pmtx-master01 [10.90.XX.XX] 7077 (?) open # ./spark-class org.apache.spark.deploy.worker.Worker spark://pmtx-master01:7077 13/11/22 10:57:50 INFO Slf4jEventHandler: Slf4jEventHandler started 13/11/22 10:57:50 INFO Worker: Starting Spark worker pmtx-master02:42271 with 8 cores, 22.6 GB RAM 13/11/22 10:57:50 INFO Worker: Spark home: /cluster/bin/spark 13/11/22 10:57:50 INFO WorkerWebUI: Started Worker web UI at http://pmtx-master02:8081 13/11/22 10:57:50 INFO Worker: Connecting to master spark://pmtx-master01:7077 13/11/22 10:57:50 ERROR Worker: Connection to master failed! Shutting down. # With spark-shell on pmtx-master02 : # MASTER=spark://pmtx-master01:7077 ./spark-shell Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 0.8.0 /_/ Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_31) Initializing interpreter... Creating SparkContext... 13/11/22 11:19:29 INFO Slf4jEventHandler: Slf4jEventHandler started 13/11/22 11:19:29 INFO SparkEnv: Registering BlockManagerMaster 13/11/22 11:19:29 INFO MemoryStore: MemoryStore started with capacity 323.9 MB. 13/11/22 11:19:29 INFO DiskStore: Created local directory at /tmp/spark-local-20131122111929-3e3c 13/11/22 11:19:29 INFO ConnectionManager: Bound socket to port 42249 with id = ConnectionManagerId(pmtx-master02,42249) 13/11/22 11:19:29 INFO BlockManagerMaster: Trying to register BlockManager 13/11/22 11:19:29 INFO BlockManagerMaster: Registered BlockManager 13/11/22 11:19:29 INFO HttpBroadcast: Broadcast server started at http://10.90.66.67:52531 13/11/22 11:19:29 INFO SparkEnv: Registering MapOutputTracker 13/11/22 11:19:29 INFO HttpFileServer: HTTP File server directory is /tmp/spark-40525f81-f883-45d5-92ad-bbff44ecf435 13/11/22 11:19:29 INFO SparkUI: Started Spark Web UI at http://pmtx-master02:4040 13/11/22 11:19:29 INFO Client$ClientActor: Connecting to master spark://pmtx-master01:7077 13/11/22 11:19:30 ERROR Client$ClientActor: Connection to master failed; stopping client 13/11/22 11:19:30 ERROR SparkDeploySchedulerBackend: Disconnected from Spark cluster! 13/11/22 11:19:30 ERROR ClusterScheduler: Exiting due to error from cluster scheduler: Disconnected from Spark cluster snip WORKING : Building HADOOP 2.0.5-alpha support On pmtx-master01, now im building hadoop 2.0.5-alpha : # sbt/sbt clean ... # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly ... # ./bin/start-master.sh starting org.apache.spark.deploy.master.Master, logging to /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out Same build on pmtx-master02 : # sbt/sbt clean ... build successfull ... # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly ... build successfull ... # ./spark-class org.apache.spark.deploy.worker.Worker spark://pmtx-master01:7077 13/11/22 11:25:34 INFO Slf4jEventHandler: Slf4jEventHandler started 13/11/22 11:25:34 INFO Worker: Starting Spark worker pmtx-master02:33768 with 8 cores, 22.6 GB RAM 13/11/22 11:25:34 INFO Worker: Spark home: /cluster/bin/spark 13/11/22 11:25:34 INFO WorkerWebUI: Started Worker web UI at http://pmtx-master02:8081 13/11/22 11:25:34 INFO Worker: Connecting to master
Any doc related to hive on hadoop 2.2?
Hi It seems to me that a lot of hadoop 2.2 support work is done on trunk. While I don't find documentations related. So any doc I can refer to , say build / run BKM etc.? Especially those part related to Yarn Stable API in hadoop 2.2 since it changes a lot from alpha. Best Regards, Raymond Liu
RE: which repo for kafka_2.10 ?
Thanks. Good to know that it will come to formal 0.8.0 release soon. I am really looking for the public maven repo for porting sparks on to it instead of running a local version ;) probably I can do with a beta1 one firstly. Best Regards, Raymond Liu -Original Message- From: Joe Stein [mailto:joe.st...@stealth.ly] Sent: Friday, November 08, 2013 9:15 PM To: users@kafka.apache.org Subject: Re: which repo for kafka_2.10 ? 0.8.0 is in process being released and when that is done Scala 2.10 will be in Maven central. Until then you can do ./sbt ++2.10 publish-local from checking out the source of Kafka as Victor just said, yup. you will be prompted to sign the jars which you can do with a pgp key or remove the pgp key plugin before running. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Fri, Nov 8, 2013 at 8:11 AM, Viktor Kolodrevskiy viktor.kolodrevs...@gmail.com wrote: Are you looking for maven repo? You can always checkout sources from http://kafka.apache.org/code.html and build it yourself. 2013/11/8 Liu, Raymond raymond@intel.com: If I want to use kafka_2.10 0.8.0-beta1, which repo I should go to? Seems apache repo don't have it. While there are com.sksamuel.kafka and com.twitter.tormenta-kafka_2.10 Which one should I go to or neither? Best Regards, Raymond Liu -- Thanks, Viktor
what's the strategy for code sync between branches e.g. scala-2.10 v.s. master?
Hi It seems to me that dev branches are sync with master by keep merging trunk codes. E.g. scala-2.10 branches continuously merge latest master code into itself for update. While I am wondering, what's the general guide line on doing this? It seems to me that not every code in master are merged into scala-2.10 branch. Say, on OCT 10, there are a merge from master to scala-2.10 branch. While some commit in OCT.4 not included. E.g. StandaloneX rename to CoarseGrainedX. So I am puzzled, how do we track which commit is already merged into scala-2.10 branch and which is not? And how do we plan to merge scala-2.10 branch back to master? And is there any good way to find out that any changes are done by 2.10 branch or by master through merge operation. It seems to me pretty hard to identify them and sync codes. It seems to me that a rebase on master won't lead to the above issues, since all branch changes will stay on the top. So any reason that merging is chosen instead of rebase other than not want a force update on checked out source? Best Regards, Raymond Liu
RE: issue regarding akka, protobuf and Hadoop version
I plan to do the work on scala-2.10 branch, which already move to akka 2.2.3, hope that to move to akka 2.3-M1 (which support protobuf 2.5.x) will not cause many problem and make it a test to see is there further issues, then wait for the formal release of akka 2.3.x While the issue is that I can see many commits on master branch is not merged into scala-2.10 branch yet. The latest merge seems to happen on OCT.11, while as I mentioned in the dev branch merge/sync thread, seems that many earlier commit is not included and which will surely bring extra works on future code merging/rebase. So again, what's the code sync strategy and what's the plan of merge back into master? Best Regards, Raymond Liu -Original Message- From: Reynold Xin [mailto:r...@apache.org] Sent: Tuesday, November 05, 2013 8:34 AM To: dev@spark.incubator.apache.org Subject: Re: issue regarding akka, protobuf and Hadoop version I chatted with Matt Massie about this, and here are some options: 1. Use dependency injection in google-guice to make Akka use one version of protobuf, and YARN use the other version. 2. Look into OSGi to accomplish the same goal. 3. Rewrite the messaging part of Spark to use a simple, custom RPC library instead of Akka. We are really only using a very simple subset of Akka features, and we can probably implement a simple RPC library tailored for Spark quickly. We should only do this as the last resort. 4. Talk to Akka guys and hope they can make a maintenance release of Akka that supports protobuf 2.5. None of these are ideal, but we'd have to pick one. It would be great if you have other suggestions. On Sun, Nov 3, 2013 at 11:46 PM, Liu, Raymond raymond@intel.com wrote: Hi I am working on porting spark onto Hadoop 2.2.0, With some renaming and call into new YARN API works done. I can run up the spark master. While I encounter the issue that Executor Actor could not connecting to Driver actor. After some investigation, I found the root cause is that the akka-remote do not support protobuf 2.5.0 before 2.3. And hadoop move to protobuf 2.5.0 from 2.1-beta. The issue is that if I exclude the akka dependency from hadoop and force protobuf dependency to 2.4.1, the compile/packing will fail since hadoop common jar require a new interface from protobuf 2.5.0. So any suggestion on this? Best Regards, Raymond Liu
Executor could not connect to Driver?
Hi I am encounter an issue that the executor actor could not connect to Driver actor. But I could not figure out what's the reason. Say the Driver actor is listening on :35838 root@sr434:~# netstat -lpv Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp0 0 *:50075 *:* LISTEN 18242/java tcp0 0 *:50020 *:* LISTEN 18242/java tcp0 0 *:ssh *:* LISTEN 1325/sshd tcp0 0 *:50010 *:* LISTEN 18242/java tcp6 0 0 sr434:35838 [::]:* LISTEN 9420/java tcp6 0 0 [::]:40390 [::]:* LISTEN 9420/java tcp6 0 0 [::]:4040 [::]:* LISTEN 9420/java tcp6 0 0 [::]:8040 [::]:* LISTEN 28324/java tcp6 0 0 [::]:60712 [::]:* LISTEN 28324/java tcp6 0 0 [::]:8042 [::]:* LISTEN 28324/java tcp6 0 0 [::]:34028 [::]:* LISTEN 9420/java tcp6 0 0 [::]:ssh[::]:* LISTEN 1325/sshd tcp6 0 0 [::]:45528 [::]:* LISTEN 9420/java tcp6 0 0 [::]:13562 [::]:* LISTEN 28324/java while the executor driver report errors as below : 13/11/01 13:16:43 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka://spark@sr434:35838/user/CoarseGrainedScheduler 13/11/01 13:16:43 ERROR executor.CoarseGrainedExecutorBackend: Driver terminated or disconnected! Shutting down. Any idea? Best Regards, Raymond Liu
RE: spark-0.8.0 and hadoop-2.1.0-beta
I am also working on porting the trunk code onto 2.2.0. Seems quite many API changes but many of them are just a rename work. While Yarn 2.1.0 beta also add some client API for easy interaction with YARN framework, but there are not many examples on how to use them ( API and wiki doc are both old and not reflecting the new API), some part of SPARK YARN code will need to be rewritten with the new client API And I am not quite familiar with the user certification part of code, it might take times for it seems to me this part of codes also change a little bit, some methods gone, and I don't find the replacement or they are not need anymore. Best Regards, Raymond Liu From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: Wednesday, October 30, 2013 2:35 AM To: user@spark.incubator.apache.org Subject: Re: spark-0.8.0 and hadoop-2.1.0-beta I'm curious, Viren, do you have a patch you could post to build this against YARN 2.1 / 2.2? It would be nice to see how big the changes are. Matei On Sep 30, 2013, at 10:14 AM, viren kumar vire...@gmail.com wrote: I was able to get Spark 0.8.0 to compile with Hadoop/Yarn 2.1.0-beta, by following some of the changes described here: http://hortonworks.com/blog/stabilizing-yarn-apis-for-apache-hadoop-2-beta-and-beyond/ That should help you build most of it. One change not covered there is the change from ProtoUtils.convertFromProtoFormat(containerToken, cmAddress) to ConverterUtils.convertFromYarn(containerToken, cmAddress). Not 100% sure that my changes are correct. Hope that helps, Viren On Sun, Sep 29, 2013 at 8:59 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi Terence, YARN's API changed in an incompatible way in Hadoop 2.1.0, so I'd suggest sticking with 2.0.x for now. We may create a different branch for this version. Unfortunately due to the API change it may not be possible to support this version while also supporting other widely-used versions like 0.23.x. Matei On Sep 29, 2013, at 11:00 AM, Terance Dias terance.d...@gmail.com wrote: Hi, I'm trying to build spark-0.8.0 with hadoop-2.1.0-beta. I have changed the following properties in SparkBuild.scala file. val DEFAULT_HADOOP_VERSION = 2.1.0-beta val DEFAULT_YARN = true when i do sbt clean compile, I get an error saying [error] /usr/local/spark-0.8.0-incubating/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:42: not found: type AMRMProtocol [error] private var resourceManager: AMRMProtocol = null Thanks, Terance.
RE: if i configed NN HA,should i still need start backup node?
You don't need, if the wiki page is correct. Best Regards, Raymond Liu From: ch huang [mailto:justlo...@gmail.com] Sent: Tuesday, October 29, 2013 12:01 PM To: user@hadoop.apache.org Subject: if i configed NN HA,should i still need start backup node? ATT
Yarn 2.2 docs or examples?
Hi I am playing with YARN 2.2, try to porting some code from pre-beta API on to the stable API. While both the wiki doc and API doc for 2.2.0 seems still stick with the old API. Though I could find some help from http://hortonworks.com/blog/stabilizing-yarn-apis-for-apache-hadoop-2-beta-and-beyond/ I still find it pretty hard figure out what's the correct way to use 2.2 API. Since a lot of API changes and some NM / AMRM client API added. I have to dig into the commit log or try to find some clue from the unit test case. So is there any docs or example I can take a look for reference? Best Regards, Raymond Liu
How to use Hadoop2 HA's logical name URL?
Hi I have setting up Hadoop 2.2.0 HA cluster following : http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Configuration_details And I can check both the active and standby namenode with WEB interface. While, it seems that the logical name could not be used to access HDFS ? I have following settings related to HA : In core-site.xml: property namefs.defaultFS/name valuehdfs://public-cluster/value /property property namedfs.ha.fencing.methods/name valuesshfence/value /property property namedfs.ha.fencing.ssh.private-key-files/name value/root/.ssh/id_rsa/value /property property namedfs.ha.fencing.ssh.connect-timeout/name value3/value /property And in hdfs-site.xml: property namedfs.nameservices/name valuepublic-cluster/value /property property namedfs.ha.namenodes.public-cluster/name valuenn1,nn2/value /property property namedfs.namenode.rpc-address.public-cluster.nn1/name value10.0.2.31:8020/value /property property namedfs.namenode.rpc-address.public-cluster.nn2/name value10.0.2.32:8020/value /property property namedfs.namenode.http-address.public-cluster.nn1/name value10.0.2.31:50070/value /property property namedfs.namenode.http-address.public-cluster.nn2/name value10.0.2.32:50070/value /property property namedfs.namenode.shared.edits.dir/name valueqjournal://10.0.0.144:8485;10.0.0.145:8485;10.0.0.146:8485/public-cluster/value /property property namedfs.journalnode.edits.dir/name value/mnt/DP_disk1/hadoop2/hdfs/jn/value /property property namedfs.client.failover.proxy.provider.mycluster/name valueorg.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider/value /property --- And then : ./bin/hdfs dfs -fs hdfs://public-cluster -ls / -ls: java.net.UnknownHostException: public-cluster Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...] While if I use the active namenode's URL, it works: ./bin/hdfs dfs -fs hdfs://10.0.2.31:8020 -ls / Found 1 items drwxr-xr-x - root supergroup 0 2013-10-24 14:30 /tmp However, shouldn't this hdfs://public-cluster kind of thing works? Anything that I might miss to make it work? Thanks! Best Regards, Raymond Liu
RE: Using Hbase with NN HA
Encounter Similar issue with NN HA URL Have you make it work? Best Regards, Raymond Liu -Original Message- From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com] Sent: Friday, October 18, 2013 5:17 PM To: user@hadoop.apache.org Subject: Using Hbase with NN HA Hi team, Can Hbase be used with namenode HA in latest hadoop-2.2.0 ? If yes is there something else required to be done other than following ? 1. Set hbase root dir to logical name of namenode service 2. Keep core site and hdfs site jn hbase conf I did above two but logical name is not recognized. Also it will be helpful if i could get some help with which versions of Hbase hive pig and mahout are compatible with latest yarn release hadoop-2.2.0. I am using hbase-0.94.12 Thanks Sent from my iPhone
RE: How to use Hadoop2 HA's logical name URL?
Hmm, my bad. NameserviceID is not sync in one of the properties After fix, it works. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Thursday, October 24, 2013 3:03 PM To: user@hadoop.apache.org Subject: How to use Hadoop2 HA's logical name URL? Hi I have setting up Hadoop 2.2.0 HA cluster following : http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Configuration_details And I can check both the active and standby namenode with WEB interface. While, it seems that the logical name could not be used to access HDFS ? I have following settings related to HA : In core-site.xml: property namefs.defaultFS/name valuehdfs://public-cluster/value /property property namedfs.ha.fencing.methods/name valuesshfence/value /property property namedfs.ha.fencing.ssh.private-key-files/name value/root/.ssh/id_rsa/value /property property namedfs.ha.fencing.ssh.connect-timeout/name value3/value /property And in hdfs-site.xml: property namedfs.nameservices/name valuepublic-cluster/value /property property namedfs.ha.namenodes.public-cluster/name valuenn1,nn2/value /property property namedfs.namenode.rpc-address.public-cluster.nn1/name value10.0.2.31:8020/value /property property namedfs.namenode.rpc-address.public-cluster.nn2/name value10.0.2.32:8020/value /property property namedfs.namenode.http-address.public-cluster.nn1/name value10.0.2.31:50070/value /property property namedfs.namenode.http-address.public-cluster.nn2/name value10.0.2.32:50070/value /property property namedfs.namenode.shared.edits.dir/name valueqjournal://10.0.0.144:8485;10.0.0.145:8485;10.0.0.146:8485/public-cluster/value /property property namedfs.journalnode.edits.dir/name value/mnt/DP_disk1/hadoop2/hdfs/jn/value /property property namedfs.client.failover.proxy.provider.mycluster/name valueorg.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider/value /property --- And then : ./bin/hdfs dfs -fs hdfs://public-cluster -ls / -ls: java.net.UnknownHostException: public-cluster Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...] While if I use the active namenode's URL, it works: ./bin/hdfs dfs -fs hdfs://10.0.2.31:8020 -ls / Found 1 items drwxr-xr-x - root supergroup 0 2013-10-24 14:30 /tmp However, shouldn't this hdfs://public-cluster kind of thing works? Anything that I might miss to make it work? Thanks! Best Regards, Raymond Liu
Fail to run on yarn with release version?
Hi I could run spark trunk code on top of yarn 2.0.5-alpha by SPARK_JAR=./core/target/spark-core-assembly-0.8.0-SNAPSHOT.jar ./run spark.deploy.yarn.Client \ --jar examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar \ --class spark.examples.SparkPi \ --args yarn-standalone \ --num-workers 3 \ --worker-memory 2g \ --worker-cores 2 While, if I use make-distribution.sh to build a release package and use this package on the cluster. Then it fails to run up. I do copy examples jar to jars/ dir. The other mode say standalone/mesos/local runs well with the release package. The error encounter is : Exception in thread main java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2265) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2272) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2311) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2293) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:317) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:163) at spark.deploy.yarn.Client.prepareLocalResources(Client.scala:117) at spark.deploy.yarn.Client.run(Client.scala:59) at spark.deploy.yarn.Client$.main(Client.scala:318) at spark.deploy.yarn.Client.main(Client.scala) google result seems leading to hdfs core-default.xml not included in the fat jar. While I checked that it did. Any idea on this issue? Thanks! Best Regards, Raymond Liu
RE: Failed to run wordcount on YARN
Hi Devaraj Thanks a lot for the explanation in detail. Best Regards, Raymond Liu -Original Message- From: Devaraj k [mailto:devara...@huawei.com] Sent: Friday, July 12, 2013 4:24 PM To: user@hadoop.apache.org Subject: RE: Failed to run wordcount on YARN Hi Raymond, In Hadoop 2.0.5 version, FileInputFormat new API doesn't support reading the files recursively in input dir. In supports only having the input dir with files. If the input dir has any child dir's then it throws below error. This has been added in trunk with this JIRA https://issues.apache.org/jira/browse/MAPREDUCE-3193. You can give input dir to the Job which doesn't have nested dir's or you can make use of the old FileInputFormat API to read files recursively in the sub dir's. Thanks Devaraj k -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: 12 July 2013 12:57 To: user@hadoop.apache.org Subject: Failed to run wordcount on YARN Hi I just start to try out hadoop2.0, I use the 2.0.5-alpha package And follow http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html to setup a cluster in non-security mode. HDFS works fine with client tools. While when I run wordcount example, there are errors : ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar wordcount /tmp /out 13/07/12 15:05:53 INFO mapreduce.Job: Task Id : attempt_1373609123233_0004_m_04_0, Status : FAILED Error: java.io.FileNotFoundException: Path is not a file: /tmp/hadoop-yarn at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:986) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:974) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:157) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:117) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1131) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:77) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:713) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:89) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:519) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method
Failed to run wordcount on YARN
Hi I just start to try out hadoop2.0, I use the 2.0.5-alpha package And follow http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html to setup a cluster in non-security mode. HDFS works fine with client tools. While when I run wordcount example, there are errors : ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar wordcount /tmp /out 13/07/12 15:05:53 INFO mapreduce.Job: Task Id : attempt_1373609123233_0004_m_04_0, Status : FAILED Error: java.io.FileNotFoundException: Path is not a file: /tmp/hadoop-yarn at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:986) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:974) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:157) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:117) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1131) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:77) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:713) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:89) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:519) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) I check the HDFS and found /tmp/hadoop-yarn is there , this dir's owner is the same as the job user. And to ensure it works, I also create /tmp/hadoop-yarn on local fs. None of it works. Any idea what might be the problem? Thx! Best Regards, Raymond Liu
what's the typical scan latency?
Hi If all the data is already in RS blockcache. Then what's the typical scan latency for scan a few rows from a say several GB table ( with dozens of regions ) on a small cluster with say 4 RS ? A few ms? Tens of ms? Or more? Best Regards, Raymond Liu
RE: what's the typical scan latency?
Thanks Amit In my envionment, I run a dozens of client to read about 5-20K data per scan concurrently, And the average read latency for cached data is around 5-20ms. So it seems there must be something wrong with my cluster env or application. Or did you run that with multiple client? Depends on so much environment related variables and on data as well. But to give you a number after all: One of our clusters is on EC2, 6 RS, on m1.xlarge machines (network performance 'high' according to aws), with 90% of the time we do reads; our avg data size is 2K, block cache at 20K, 100 rows per scan avg, bloom filters 'on' at the 'ROW' level, 40% of heap dedicated to block cache (note that it contains several other bits and pieces) and I would say our average latency for cached data (~97% blockCacheHitCachingRatio) is 3-4ms. File system access is much much painful, especially on ec2 m1.xlarge where you really can't tell what's going on, as far as I can tell. To tell you the truth as I see it, this is an abuse (for our use case) of the HBase store and for cache like behavior I would recommend going to something like Redis. On Mon, Jun 3, 2013 at 12:13 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: What is that you are observing now? Regards Ram On Mon, Jun 3, 2013 at 2:00 PM, Liu, Raymond raymond@intel.com wrote: Hi If all the data is already in RS blockcache. Then what's the typical scan latency for scan a few rows from a say several GB table ( with dozens of regions ) on a small cluster with say 4 RS ? A few ms? Tens of ms? Or more? Best Regards, Raymond Liu
RE: checkAnd...
How about this one : https://issues.apache.org/jira/browse/HBASE-8542 Best Regards, Raymond Liu -Original Message- From: Lior Schachter [mailto:lior...@gmail.com] Sent: Thursday, May 16, 2013 1:18 AM To: user Subject: Re: checkAnd... yes, I believe this will cover most of the use-cases. Lior On Tue, May 14, 2013 at 9:25 PM, Mike Spreitzer mspre...@us.ibm.com wrote: Why not go whole hog and create checkAndMultiMutate (for all varieties of mutation) (all on the same row)? Thanks, Mike From: Lior Schachter lior...@gmail.com To: user user@hbase.apache.org, Date: 04/27/2013 10:31 AM Subject:Re: checkAnd... found the checkAndIncrement Jira - HBASE-6712https://issues.apache.org/jira/browse/HBASE-6712 . Would be nice to have also checkandAppend. Any ideas how to solve to the use case I described ? On Sat, Apr 27, 2013 at 4:46 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Ted, Will it be a good idea to add it? Should we open a JIRA and implement checkANDIncrement? Might be pretty simple. JM 2013/4/27 Ted Yu yuzhih...@gmail.com: Take a look at the following method in HRegionServer: public boolean checkAndPut(final byte[] regionName, final byte[] row, final byte[] family, final byte[] qualifier, final byte[] value, final Put put) throws IOException { You can create checkAndIncrement() in a similar way. Cheers On Sat, Apr 27, 2013 at 9:02 PM, Lior Schachter lior...@gmail.com wrote: Hi, I want to increment a cell value only after checking a condition on another cell. I could find checkAndPut/checkAndDelete on HTableInteface. It seems that checkAndIncrement (and checkAndAppend) are missing. Can you suggest a workaround for my use-case ? working with version 0.94.5. Thanks, Lior
RE: How to implement this check put and then update something logic?
Well, this did come from a graph domain. However, I think this could be a common problem when you need to update something according to the original value where a simple checkAndPut on single value won't work. Another example, if you want to implement something like UPDATE, you want to know whether this is a new value inserted, or update to an old value. It won't be easy now, You need to checkAndPut on null, if not null, then you get the value and checkAndPut on that value, since you want to make sure the column is still there. If it fails , you loop back from check null. So I think a little bit of enhancement on current HBASE atomic operation could greatly improve the usability upon similar problems. Or maybe there are already solution for this type of issue? Maybe this problem is more in the graph domain? I know that there are projects aimed at representing graphs at large scale better. I'm saying this since you have one ID referencing another ID (using target ID). On May 10, 2013, at 11:47 AM, Liu, Raymond raymond@intel.com wrote: Thanks, seems there are no other better solution? Really need a GetAndPut atomic op here ... You can do this by looping over a checkAndPut operation until it succeeds. -Mike On Thu, May 9, 2013 at 8:52 PM, Liu, Raymond raymond@intel.com wrote: Any suggestion? Hi Say, I have four field for one record :id, status, targetid, and count. Status is on and off, target could reference other id, and count will record the number of on status for all targetid from same id. The record could be add / delete, or updated to change the status. I could put count in another table, or put it in the same table, it doesn't matter. As long as it can work. My question is how can I ensure its correctness of the count field when run with multiple client update the table concurrently? The closet thing I can think of is checkAndPut, but I will need two steps to find out the change of count, since checkAndPut etc can only test a single value and with EQUAL comparator, thus I can only check upon null firstly, then on or off. Thus when thing change during this two step, I need to retry from first step until it succeed. This could be bad when a lot of concurrent op is on going. And then, I need to update count by checkAndIncrement, though if the above problem could be solved, the order of -1 +1 might not be important for the final result, but in some intermediate time, it might not reflect the real count of that time. I know this kind of transaction is not the target of HBASE, APP should take care of it, then , what's the best practice on this? Any quick simple solution for my problem? Client RowLock could solve this issue, But it seems to me that it is not safe and is not recommended and deprecated? Btw. Is that possible or practice to implement something like PutAndGet which put in new row and return the old row back to client been implemented? That would help a lot for my case. Best Regards, Raymond Liu
RE: How to implement this check put and then update something logic?
Thanks, seems there are no other better solution? Really need a GetAndPut atomic op here ... You can do this by looping over a checkAndPut operation until it succeeds. -Mike On Thu, May 9, 2013 at 8:52 PM, Liu, Raymond raymond@intel.com wrote: Any suggestion? Hi Say, I have four field for one record :id, status, targetid, and count. Status is on and off, target could reference other id, and count will record the number of on status for all targetid from same id. The record could be add / delete, or updated to change the status. I could put count in another table, or put it in the same table, it doesn't matter. As long as it can work. My question is how can I ensure its correctness of the count field when run with multiple client update the table concurrently? The closet thing I can think of is checkAndPut, but I will need two steps to find out the change of count, since checkAndPut etc can only test a single value and with EQUAL comparator, thus I can only check upon null firstly, then on or off. Thus when thing change during this two step, I need to retry from first step until it succeed. This could be bad when a lot of concurrent op is on going. And then, I need to update count by checkAndIncrement, though if the above problem could be solved, the order of -1 +1 might not be important for the final result, but in some intermediate time, it might not reflect the real count of that time. I know this kind of transaction is not the target of HBASE, APP should take care of it, then , what's the best practice on this? Any quick simple solution for my problem? Client RowLock could solve this issue, But it seems to me that it is not safe and is not recommended and deprecated? Btw. Is that possible or practice to implement something like PutAndGet which put in new row and return the old row back to client been implemented? That would help a lot for my case. Best Regards, Raymond Liu
RE: How to implement this check put and then update something logic?
Any suggestion? Hi Say, I have four field for one record :id, status, targetid, and count. Status is on and off, target could reference other id, and count will record the number of on status for all targetid from same id. The record could be add / delete, or updated to change the status. I could put count in another table, or put it in the same table, it doesn't matter. As long as it can work. My question is how can I ensure its correctness of the count field when run with multiple client update the table concurrently? The closet thing I can think of is checkAndPut, but I will need two steps to find out the change of count, since checkAndPut etc can only test a single value and with EQUAL comparator, thus I can only check upon null firstly, then on or off. Thus when thing change during this two step, I need to retry from first step until it succeed. This could be bad when a lot of concurrent op is on going. And then, I need to update count by checkAndIncrement, though if the above problem could be solved, the order of -1 +1 might not be important for the final result, but in some intermediate time, it might not reflect the real count of that time. I know this kind of transaction is not the target of HBASE, APP should take care of it, then , what's the best practice on this? Any quick simple solution for my problem? Client RowLock could solve this issue, But it seems to me that it is not safe and is not recommended and deprecated? Btw. Is that possible or practice to implement something like PutAndGet which put in new row and return the old row back to client been implemented? That would help a lot for my case. Best Regards, Raymond Liu
How to implement this check put and then update something logic?
Hi Say, I have four field for one record :id, status, targetid, and count. Status is on and off, target could reference other id, and count will record the number of on status for all targetid from same id. The record could be add / delete, or updated to change the status. I could put count in another table, or put it in the same table, it doesn't matter. As long as it can work. My question is how can I ensure its correctness of the count field when run with multiple client update the table concurrently? The closet thing I can think of is checkAndPut, but I will need two steps to find out the change of count, since checkAndPut etc can only test a single value and with EQUAL comparator, thus I can only check upon null firstly, then on or off. Thus when thing change during this two step, I need to retry from first step until it succeed. This could be bad when a lot of concurrent op is on going. And then, I need to update count by checkAndIncrement, though if the above problem could be solved, the order of -1 +1 might not be important for the final result, but in some intermediate time, it might not reflect the real count of that time. I know this kind of transaction is not the target of HBASE, APP should take care of it, then , what's the best practice on this? Any quick simple solution for my problem? Client RowLock could solve this issue, But it seems to me that it is not safe and is not recommended and deprecated? Btw. Is that possible or practice to implement something like PutAndGet which put in new row and return the old row back to client been implemented? That would help a lot for my case. Best Regards, Raymond Liu
RE: How to implement this check put and then update something logic?
Btw. Is that possible or practice to implement something like PutAndGet which put in new row and return the old row back to client been implemented? That would help a lot for my case. Oh, I realized that it is better to be named as GetAndMutate, say Mutate anyway, but return the original value. Best Regards, Raymond Liu
RE: 答复: HBase random read performance
So what is lacking here? The action should also been parallel inside RS for each region, Instead of just parallel on RS level? Seems this will be rather difficult to implement, and for Get, might not be worthy? I looked at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java in 0.94 In processBatchCallback(), starting line 1538, // step 1: break up into regionserver-sized chunks and build the data structs MapHRegionLocation, MultiActionR actionsByServer = new HashMapHRegionLocation, MultiActionR(); for (int i = 0; i workingList.size(); i++) { So we do group individual action by server. FYI On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu yuzhih...@gmail.com wrote: Doug made a good point. Take a look at the performance gain for parallel scan (bottom chart compared to top chart): https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png See https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=1362 8300page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpan el#comment-13628300for explanation of the two methods. Cheers On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil doug.m...@explorysmedical.comwrote: Hi there, regarding this... We are passing random 1 row-keys as input, while HBase is taking around 17 secs to return 1 records. …. Given that you are generating 10,000 random keys, your multi-get is very likely hitting all 5 nodes of your cluster. Historically, multi-Get used to first sort the requests by RS and then *serially* go the RS to process the multi-Get. I'm not sure of the current (0.94.x) behavior if it multi-threads or not. One thing you might want to consider is confirming that client behavior, and if it's not multi-threading then perform a test that does the same RS sorting via... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable .html# getRegionLocation%28byte[http://hbase.apache.org/apidocs/org/apache/ hadoop/hbase/client/HTable.html#getRegionLocation%28byte[ ]%29 …. and then spin up your own threads (one per target RS) and see what happens. On 4/15/13 9:04 AM, Ankit Jain ankitjainc...@gmail.com wrote: Hi Liang, Thanks Liang for reply.. Ans1: I tried by using HFile block size of 32 KB and bloom filter is enabled. The random read performance is 1 records in 23 secs. Ans2: We are retrieving all the 1 rows in one call. Ans3: Disk detai: Model Number: ST2000DM001-1CH164 Serial Number: Z1E276YF Please suggest some more optimization Thanks, Ankit Jain On Mon, Apr 15, 2013 at 5:11 PM, 谢良 xieli...@xiaomi.com wrote: First, it's probably helpless to set block size to 4KB, please refer to the beginning of HFile.java: Smaller blocks are good * for random access, but require more memory to hold the block index, and may * be slower to create (because we must flush the compressor stream at the * conclusion of each data block, which leads to an FS I/O flush). Further, due * to the internal caching in Compression codec, the smallest possible block * size would be around 20KB-30KB. Second, is it a single-thread test client or multi-threads? we couldn't expect too much if the requests are one by one. Third, could you provide more info about your DN disk numbers and IO utils ? Thanks, Liang 发件人: Ankit Jain [ankitjainc...@gmail.com] 发送时间: 2013年4月15日 18:53 收件人: user@hbase.apache.org 主题: Re: HBase random read performance Hi Anoop, Thanks for reply.. I tried by setting Hfile block size 4KB and also enabled the bloom filter(ROW). The maximum read performance that I was able to achieve is 1 records in 14 secs (size of record is 1.6KB). Please suggest some tuning.. Thanks, Ankit Jain On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal rishabh.agra...@impetus.co.in wrote: Interesting. Can you explain why this happens? -Original Message- From: Anoop Sam John [mailto:anoo...@huawei.com] Sent: Monday, April 15, 2013 3:47 PM To: user@hbase.apache.org Subject: RE: HBase random read performance Ankit I guess you might be having default HFile block size which is 64KB. For random gets a lower value will be better. Try will some thing like 8KB and check the latency? Ya ofcourse blooms can help (if major compaction was not done at the time of testing) -Anoop- From: Ankit Jain [ankitjainc...@gmail.com] Sent: Saturday, April 13, 2013 11:01 AM To: user@hbase.apache.org Subject: HBase random read performance Hi All, We are using HBase 0.94.5 and Hadoop
RE: composite query on hbase and rcfile
I guess rob mean that use one query to query rcfile and HBASE table at the same time. If your query is on two table, one upon rcfile, another upon HBASE through hbase storage handler, I think that should be ok. Best Regards, Raymond Liu what's mean a composite query? Hive's query doesn't depends on file format, it can be ran on text file, sequence file, rcfile etc. On Thu, Apr 11, 2013 at 6:14 AM, ur lops urlop...@gmail.com wrote: Hi, Does anyone know, if hive can run a composite query over RCFILE and HBASE in the same query? Quick anwer will be highly appreciated Thanks in advance. Rob
Does a major compact flush memstore?
It seems to me that a major_compact table command from hbase shell do not fush memstore? When I done with major compact, still some data in memstore and will be flush out to disk when I shut down hbase cluster. Best Regards, Raymond Liu
RE: Does a major compact flush memstore?
I tried both hbase shell's major_compact cmd and java api HBaseAdmin.majorCompact() on table name. They don't flush the memstore on to disk, compact cmd seems not doing that too. I hadn't read enough related code, While I am wondering, is that because there are size threshold before a memstore is flushed? Then a user invoked compact don't force to flush it? Best Regards, Raymond Liu Did you try from java api? If flush does not happen we may need to fix it. Regards RAm On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond raymond@intel.com wrote: It seems to me that a major_compact table command from hbase shell do not fush memstore? When I done with major compact, still some data in memstore and will be flush out to disk when I shut down hbase cluster. Best Regards, Raymond Liu
RE: Does a major compact flush memstore?
St.Ack I am not sure what's the design idea behind it. While, If I want to invoke a major compact manually, I guess what I want is that all separate file and the memstore is combined into one file. If I don't write anything new there, from the user point of view, I will assume that it will end up in a single store file per region. Best Regards, Raymond Liu Raymond: Major compaction does not first flush. Should it or should it be an option? St.Ack On Tue, Mar 12, 2013 at 6:46 PM, Liu, Raymond raymond@intel.com wrote: I tried both hbase shell's major_compact cmd and java api HBaseAdmin.majorCompact() on table name. They don't flush the memstore on to disk, compact cmd seems not doing that too. I hadn't read enough related code, While I am wondering, is that because there are size threshold before a memstore is flushed? Then a user invoked compact don't force to flush it? Best Regards, Raymond Liu Did you try from java api? If flush does not happen we may need to fix it. Regards RAm On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond raymond@intel.com wrote: It seems to me that a major_compact table command from hbase shell do not fush memstore? When I done with major compact, still some data in memstore and will be flush out to disk when I shut down hbase cluster. Best Regards, Raymond Liu
RE: How HBase perform per-column scan?
Just curious, won't ROWCOL bloom filter works for this case? Best Regards, Raymond Liu As per the above said, you will need a full table scan on that CF. As Ted said, consider having a look at your schema design. -Anoop- On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu yuzhih...@gmail.com wrote: bq. physically column family should be able to perform efficiently (storage layer When you scan a row, data for different column families would be brought into memory (if you don't utilize HBASE-5416) Take a look at: https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354 1258page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabp anel#comment-13541258 which was based on the settings described in: https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354 1191page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabp anel#comment-13541191 This boils down to your schema design. If possible, consider extracting column C into its own column family. Cheers On Sun, Mar 10, 2013 at 7:14 AM, PG pengyunm...@gmail.com wrote: Hi, Ted and Anoop, thanks for your notes. I am talking about column rather than column family, since physically column family should be able to perform efficiently (storage layer, CF's are stored separately). But columns of the same column family may be mixed physically, and that makes filters column value hard... So I want to know if there are any mechanism in HBase worked on this... Regards, Yun On Mar 10, 2013, at 10:01 AM, Ted Yu yuzhih...@gmail.com wrote: Hi, Yun: Take a look at HBASE-5416 (Improve performance of scans with some kind of filters) which is in 0.94.5 release. In your case, you can use a filter which specifies column C as the essential family. Here I interpret column C as column family. Cheers On Sat, Mar 9, 2013 at 11:11 AM, yun peng pengyunm...@gmail.com wrote: Hi, All, I want to find all existing values for a given column in a HBase, and would that result in a full-table scan in HBase? For example, given a column C, the table is of very large number of rows, from which few rows (say only 1 row) have non-empty values for column C. Would HBase still ues a full table scan to find this row? Or HBase has any optimization work for this kind of query? Thanks... Regards Yun
RE: How HBase perform per-column scan?
Hmm, I don't mean query bloom filter directly. I mean the storefilescanner will query rowcol bloomfilter to see is it need a seek or not. And I guess this will be performed on every row without need to specific a row keys? ROWCOL bloom says whether for a given row (rowkey) a given column (qualifier) is present in an HFile or not. But for the user he dont know the rowkeys. He wants all the rows with column 'x' -Anoop- From: Liu, Raymond [raymond@intel.com] Sent: Monday, March 11, 2013 7:43 AM To: user@hbase.apache.org Subject: RE: How HBase perform per-column scan? Just curious, won't ROWCOL bloom filter works for this case? Best Regards, Raymond Liu As per the above said, you will need a full table scan on that CF. As Ted said, consider having a look at your schema design. -Anoop- On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu yuzhih...@gmail.com wrote: bq. physically column family should be able to perform efficiently (storage layer When you scan a row, data for different column families would be brought into memory (if you don't utilize HBASE-5416) Take a look at: https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354 1258page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta bp anel#comment-13541258 which was based on the settings described in: https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354 1191page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta bp anel#comment-13541191 This boils down to your schema design. If possible, consider extracting column C into its own column family. Cheers On Sun, Mar 10, 2013 at 7:14 AM, PG pengyunm...@gmail.com wrote: Hi, Ted and Anoop, thanks for your notes. I am talking about column rather than column family, since physically column family should be able to perform efficiently (storage layer, CF's are stored separately). But columns of the same column family may be mixed physically, and that makes filters column value hard... So I want to know if there are any mechanism in HBase worked on this... Regards, Yun On Mar 10, 2013, at 10:01 AM, Ted Yu yuzhih...@gmail.com wrote: Hi, Yun: Take a look at HBASE-5416 (Improve performance of scans with some kind of filters) which is in 0.94.5 release. In your case, you can use a filter which specifies column C as the essential family. Here I interpret column C as column family. Cheers On Sat, Mar 9, 2013 at 11:11 AM, yun peng pengyunm...@gmail.com wrote: Hi, All, I want to find all existing values for a given column in a HBase, and would that result in a full-table scan in HBase? For example, given a column C, the table is of very large number of rows, from which few rows (say only 1 row) have non-empty values for column C. Would HBase still ues a full table scan to find this row? Or HBase has any optimization work for this kind of query? Thanks... Regards Yun
Is there any way to balance one table?
Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I want to fix this table. Best Regards, Raymond Liu
RE: Is there any way to balance one table?
0.94.1 Any cmd in shell? Or I need to change balance threshold to 0 an run global balancer cmd in shell? Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, February 20, 2013 9:09 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? What version of HBase are you using ? 0.94 has per-table load balancing. Cheers On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote: Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I want to fix this table. Best Regards, Raymond Liu
RE: Is there any way to balance one table?
I choose to move region manually. Any other approaching? 0.94.1 Any cmd in shell? Or I need to change balance threshold to 0 an run global balancer cmd in shell? Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, February 20, 2013 9:09 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? What version of HBase are you using ? 0.94 has per-table load balancing. Cheers On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote: Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I want to fix this table. Best Regards, Raymond Liu
RE: Is there any way to balance one table?
Hi I do call balancer, while it seems it doesn't work. Might due to this table is small and overall region number difference is within threshold? -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Wednesday, February 20, 2013 10:59 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? Hi Liu, Why did not you simply called the balancer? If other tables are already balanced, it should not touch them and will only balance the table which is not balancer? JM 2013/2/19, Liu, Raymond raymond@intel.com: I choose to move region manually. Any other approaching? 0.94.1 Any cmd in shell? Or I need to change balance threshold to 0 an run global balancer cmd in shell? Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, February 20, 2013 9:09 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? What version of HBase are you using ? 0.94 has per-table load balancing. Cheers On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote: Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I want to fix this table. Best Regards, Raymond Liu
RE: Is there any way to balance one table?
I mean region number is small. Overall I have say 3000 region on 4 node, while this table only have 96 region. It won't be 24 for each region server, instead , will be something like 19/30/23/21 etc. This means that I need to limit the slop to 0.02 etc? so that the balancer actually run on this table? Best Regards, Raymond Liu From: Marcos Ortiz [mailto:mlor...@uci.cu] Sent: Wednesday, February 20, 2013 11:44 AM To: user@hbase.apache.org Cc: Liu, Raymond Subject: Re: Is there any way to balance one table? What is the size of your table? On 02/19/2013 10:40 PM, Liu, Raymond wrote: Hi I do call balancer, while it seems it doesn't work. Might due to this table is small and overall region number difference is within threshold? -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Wednesday, February 20, 2013 10:59 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? Hi Liu, Why did not you simply called the balancer? If other tables are already balanced, it should not touch them and will only balance the table which is not balancer? JM 2013/2/19, Liu, Raymond raymond@intel.com: I choose to move region manually. Any other approaching? 0.94.1 Any cmd in shell? Or I need to change balance threshold to 0 an run global balancer cmd in shell? Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, February 20, 2013 9:09 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? What version of HBase are you using ? 0.94 has per-table load balancing. Cheers On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote: Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I want to fix this table. Best Regards, Raymond Liu -- Marcos Ortiz Valmaseda, Product Manager Data Scientist at UCI Blog: http://marcosluis2186.posterous.com Twitter: @marcosluis2186
RE: Is there any way to balance one table?
Yeah, Since balance is already done on each table, why slop is not calculate upon each table... You're right. Default sloppiness is 20%: this.slop = conf.getFloat(hbase.regions.slop, (float) 0.2); src/main/java/org/apache/hadoop/hbase/master/DefaultLoadBalancer.java Meaning, region count on any server can be as far as 20% from average region count. You can tighten sloppiness. On Tue, Feb 19, 2013 at 7:40 PM, Liu, Raymond raymond@intel.com wrote: Hi I do call balancer, while it seems it doesn't work. Might due to this table is small and overall region number difference is within threshold? -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Wednesday, February 20, 2013 10:59 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? Hi Liu, Why did not you simply called the balancer? If other tables are already balanced, it should not touch them and will only balance the table which is not balancer? JM 2013/2/19, Liu, Raymond raymond@intel.com: I choose to move region manually. Any other approaching? 0.94.1 Any cmd in shell? Or I need to change balance threshold to 0 an run global balancer cmd in shell? Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, February 20, 2013 9:09 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? What version of HBase are you using ? 0.94 has per-table load balancing. Cheers On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote: Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I want to fix this table. Best Regards, Raymond Liu
RE: Is there any way to balance one table?
Hmm, in order to have the 96 region table be balanced within 20% On a 3000 region cluster when all other table is balanced. the slop will need to be around 20%/30, say 0.006? won't it be too small? Yes, Raymond. You should lower sloppiness. On Tue, Feb 19, 2013 at 7:48 PM, Liu, Raymond raymond@intel.com wrote: I mean region number is small. Overall I have say 3000 region on 4 node, while this table only have 96 region. It won't be 24 for each region server, instead , will be something like 19/30/23/21 etc. This means that I need to limit the slop to 0.02 etc? so that the balancer actually run on this table? Best Regards, Raymond Liu From: Marcos Ortiz [mailto:mlor...@uci.cu] Sent: Wednesday, February 20, 2013 11:44 AM To: user@hbase.apache.org Cc: Liu, Raymond Subject: Re: Is there any way to balance one table? What is the size of your table? On 02/19/2013 10:40 PM, Liu, Raymond wrote: Hi I do call balancer, while it seems it doesn't work. Might due to this table is small and overall region number difference is within threshold? -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Wednesday, February 20, 2013 10:59 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? Hi Liu, Why did not you simply called the balancer? If other tables are already balanced, it should not touch them and will only balance the table which is not balancer? JM 2013/2/19, Liu, Raymond raymond@intel.com: I choose to move region manually. Any other approaching? 0.94.1 Any cmd in shell? Or I need to change balance threshold to 0 an run global balancer cmd in shell? Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, February 20, 2013 9:09 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? What version of HBase are you using ? 0.94 has per-table load balancing. Cheers On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote: Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I want to fix this table. Best Regards, Raymond Liu -- Marcos Ortiz Valmaseda, Product Manager Data Scientist at UCI Blog: http://marcosluis2186.posterous.com Twitter: @marcosluis2186
RE: Is there any way to balance one table?
You mean slop is also base on per table? Weird, then it should work for my case let me check again. Best Regards, Raymond Liu bq. On a 3000 region cluster Balancing is per-table. Meaning total number of regions doesn't come into play. On Tue, Feb 19, 2013 at 7:55 PM, Liu, Raymond raymond@intel.com wrote: Hmm, in order to have the 96 region table be balanced within 20% On a 3000 region cluster when all other table is balanced. the slop will need to be around 20%/30, say 0.006? won't it be too small? Yes, Raymond. You should lower sloppiness. On Tue, Feb 19, 2013 at 7:48 PM, Liu, Raymond raymond@intel.com wrote: I mean region number is small. Overall I have say 3000 region on 4 node, while this table only have 96 region. It won't be 24 for each region server, instead , will be something like 19/30/23/21 etc. This means that I need to limit the slop to 0.02 etc? so that the balancer actually run on this table? Best Regards, Raymond Liu From: Marcos Ortiz [mailto:mlor...@uci.cu] Sent: Wednesday, February 20, 2013 11:44 AM To: user@hbase.apache.org Cc: Liu, Raymond Subject: Re: Is there any way to balance one table? What is the size of your table? On 02/19/2013 10:40 PM, Liu, Raymond wrote: Hi I do call balancer, while it seems it doesn't work. Might due to this table is small and overall region number difference is within threshold? -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Wednesday, February 20, 2013 10:59 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? Hi Liu, Why did not you simply called the balancer? If other tables are already balanced, it should not touch them and will only balance the table which is not balancer? JM 2013/2/19, Liu, Raymond raymond@intel.com: I choose to move region manually. Any other approaching? 0.94.1 Any cmd in shell? Or I need to change balance threshold to 0 an run global balancer cmd in shell? Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, February 20, 2013 9:09 AM To: user@hbase.apache.org Subject: Re: Is there any way to balance one table? What version of HBase are you using ? 0.94 has per-table load balancing. Cheers On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote: Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I want to fix this table. Best Regards, Raymond Liu -- Marcos Ortiz Valmaseda, Product Manager Data Scientist at UCI Blog: http://marcosluis2186.posterous.com Twitter: @marcosluis2186
RE: why my test result on dfs short circuit read is slower?
I have try to tune io.file.buffer.size to 128K instead of 4K ShortCircuit read performance is still worse than read through datanode. I am start to wondering, does shortcircuit read really help under hadoop 1.1.1 version? I google to find a few people mention they got 2x gain or so upon CDH etc. But I really can't find out what else I can do to make it even just catch up normal read path It seems to me that, with short circuit read enabled, the BlockReaderLocal read data in 512/4096 bytes unit(checksum check enabled/skiped) While when It go through datanode, the BlockSender.sendChunks will read and sent data in 64K bytes units? Is that true? And if so, won't it explain that read through datanode will be faster? Since it read data in bigger block size. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Saturday, February 16, 2013 2:23 PM To: user@hadoop.apache.org Subject: RE: why my test result on dfs short circuit read is slower? Hi Arpit Gupta Yes, this way also confirms that short circuit read is enabled on my cluster. 13/02/16 14:07:34 DEBUG hdfs.DFSClient: Short circuit read is true 13/02/16 14:07:34 DEBUG hdfs.DFSClient: New BlockReaderLocal for file /mnt/DP_disk4/raymond/hdfs/data/current/subdir63/blk_-2736548898990727 638 of size 134217728 startOffset 0 length 134217728 short circuit checksum false So , any possibility that other setting might impact short circuit read to has worse performance than read through datanode? Raymond Another way to check if short circuit read is configured correctly. As the user who is configured for short circuit read issue the following command on a node where you expect the data to be read locally. export HADOOP_ROOT_LOGGER=debug,console; hadoop dfs -cat /path/to/file_on_hdfs On the console you should see something like hdfs.DFSClient: New BlockReaderLocal for file This would confirm that short circuit read is happening. -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Feb 15, 2013, at 9:53 PM, Liu, Raymond raymond@intel.com wrote: Hi Harsh Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml. And I have double confirmed that local reads are performed, since there are no Error in datanode logs, and by watching lo network IO. If you want HBase to leverage the shortcircuit, the DN config dfs.block.local-path-access.user should be set to the user running HBase (i.e. hbase, for example), and the hbase-site.xml should have dfs.client.read.shortcircuit defined in all its RegionServers. Doing this wrong could result in performance penalty and some warn-logging, as local reads will be attempted but will begin to fail. On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond raymond@intel.com wrote: Hi I tried to use short circuit read to improve my hbase cluster MR scan performance. I have the following setting in hdfs-site.xml dfs.client.read.shortcircuit set to true dfs.block.local-path-access.user set to MR job runner. The cluster is 1+4 node and each data node have 16cpu/4HDD, with all hbase table major compact thus all data is local. I have hoped that the short circuit read will improve the performance. While the test result is that with short circuit read enabled, the performance actually dropped 10-15%. Say scan a 50G table cost around 100s instead of 90s. My hadoop version is 1.1.1, any idea on this? Thx! Best Regards, Raymond Liu -- Harsh J
RE: why my test result on dfs short circuit read is slower?
Alright, I think in my sequence read scenario, it is possible that shortcircuit read is actually slower than read through datanode. For, when read through datanode, FS read operation is done by datanode daemon, while data processing is done by client. Thus when client is processing the data, data node could read data at the same time and write it to local socket for client to read. It take very few time for client to read from local socket. While when client read native FS directly, all the job is done by client. It will be blocked for more time to read data from native FS for process than from local socket. Overall, when CPU is not bound, and data node always prepare further data for client(due to sequence read scenario), the result is that shortcircuit read is slower though it cost less CPU load. The CPU/idle/IOWait load seems also justify my guess = For a scan only job: = Read through datanode: CPU : 30-35% / IOWait : ~50% / 300 second Shortcircuit read: CPU : 25-30% / IOWait : ~40% / 330 second Short circuit read is 10% slower = For a job do more calculation = Read through datanode: CPU : 80-90% / IOWait : ~5-10% / 190 seconds Shortcircuit read: CPU : ~90% / IOWait : ~2-3% / 160 second Short circuit read is 15% faster. So, short circuit read is not always faster, especially when CPU is not bound and read by sequence, it will be slower. This is the best explain I can get now. Any thoughts? Raymond I have try to tune io.file.buffer.size to 128K instead of 4K ShortCircuit read performance is still worse than read through datanode. I am start to wondering, does shortcircuit read really help under hadoop 1.1.1 version? I google to find a few people mention they got 2x gain or so upon CDH etc. But I really can't find out what else I can do to make it even just catch up normal read path It seems to me that, with short circuit read enabled, the BlockReaderLocal read data in 512/4096 bytes unit(checksum check enabled/skiped) While when It go through datanode, the BlockSender.sendChunks will read and sent data in 64K bytes units? Is that true? And if so, won't it explain that read through datanode will be faster? Since it read data in bigger block size. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Saturday, February 16, 2013 2:23 PM To: user@hadoop.apache.org Subject: RE: why my test result on dfs short circuit read is slower? Hi Arpit Gupta Yes, this way also confirms that short circuit read is enabled on my cluster. 13/02/16 14:07:34 DEBUG hdfs.DFSClient: Short circuit read is true 13/02/16 14:07:34 DEBUG hdfs.DFSClient: New BlockReaderLocal for file /mnt/DP_disk4/raymond/hdfs/data/current/subdir63/blk_-2736548898990727 638 of size 134217728 startOffset 0 length 134217728 short circuit checksum false So , any possibility that other setting might impact short circuit read to has worse performance than read through datanode? Raymond Another way to check if short circuit read is configured correctly. As the user who is configured for short circuit read issue the following command on a node where you expect the data to be read locally. export HADOOP_ROOT_LOGGER=debug,console; hadoop dfs -cat /path/to/file_on_hdfs On the console you should see something like hdfs.DFSClient: New BlockReaderLocal for file This would confirm that short circuit read is happening. -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Feb 15, 2013, at 9:53 PM, Liu, Raymond raymond@intel.com wrote: Hi Harsh Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml. And I have double confirmed that local reads are performed, since there are no Error in datanode logs, and by watching lo network IO. If you want HBase to leverage the shortcircuit, the DN config dfs.block.local-path-access.user should be set to the user running HBase (i.e. hbase, for example), and the hbase-site.xml should have dfs.client.read.shortcircuit defined in all its RegionServers. Doing this wrong could result in performance penalty and some warn-logging, as local reads will be attempted but will begin to fail. On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond raymond@intel.com wrote: Hi I tried to use short circuit read to improve my hbase cluster MR scan performance. I have the following setting in hdfs-site.xml dfs.client.read.shortcircuit set to true dfs.block.local-path-access.user set to MR job runner. The cluster is 1+4 node and each data node have 16cpu/4HDD, with all hbase table major compact thus all data is local. I have hoped that the short circuit read will improve the performance
why my test result on dfs short circuit read is slower?
Hi I tried to use short circuit read to improve my hbase cluster MR scan performance. I have the following setting in hdfs-site.xml dfs.client.read.shortcircuit set to true dfs.block.local-path-access.user set to MR job runner. The cluster is 1+4 node and each data node have 16cpu/4HDD, with all hbase table major compact thus all data is local. I have hoped that the short circuit read will improve the performance. While the test result is that with short circuit read enabled, the performance actually dropped 10-15%. Say scan a 50G table cost around 100s instead of 90s. My hadoop version is 1.1.1, any idea on this? Thx! Best Regards, Raymond Liu
RE: why my test result on dfs short circuit read is slower?
Hi Liang Did you mean set dfs.permissions to false? Is that all I need to do to disable security feature? Cause It seems to me that without change dfs.block.local-path-access.user, dfs.permissions alone doesn't works. HBASE still fall back to go through datanode to read data. Hi Raymond, did you enable security feature in your cluster? there'll be no obvious benefit be found if so. Regards, Liang ___ 发件人: Liu, Raymond [raymond@intel.com] 发送时间: 2013年2月16日 11:10 收件人: user@hadoop.apache.org 主题: why my test result on dfs short circuit read is slower? Hi I tried to use short circuit read to improve my hbase cluster MR scan performance. I have the following setting in hdfs-site.xml dfs.client.read.shortcircuit set to true dfs.block.local-path-access.user set to MR job runner. The cluster is 1+4 node and each data node have 16cpu/4HDD, with all hbase table major compact thus all data is local. I have hoped that the short circuit read will improve the performance. While the test result is that with short circuit read enabled, the performance actually dropped 10-15%. Say scan a 50G table cost around 100s instead of 90s. My hadoop version is 1.1.1, any idea on this? Thx! Best Regards, Raymond Liu
RE: why my test result on dfs short circuit read is slower?
Hi Harsh Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml. And I have double confirmed that local reads are performed, since there are no Error in datanode logs, and by watching lo network IO. If you want HBase to leverage the shortcircuit, the DN config dfs.block.local-path-access.user should be set to the user running HBase (i.e. hbase, for example), and the hbase-site.xml should have dfs.client.read.shortcircuit defined in all its RegionServers. Doing this wrong could result in performance penalty and some warn-logging, as local reads will be attempted but will begin to fail. On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond raymond@intel.com wrote: Hi I tried to use short circuit read to improve my hbase cluster MR scan performance. I have the following setting in hdfs-site.xml dfs.client.read.shortcircuit set to true dfs.block.local-path-access.user set to MR job runner. The cluster is 1+4 node and each data node have 16cpu/4HDD, with all hbase table major compact thus all data is local. I have hoped that the short circuit read will improve the performance. While the test result is that with short circuit read enabled, the performance actually dropped 10-15%. Say scan a 50G table cost around 100s instead of 90s. My hadoop version is 1.1.1, any idea on this? Thx! Best Regards, Raymond Liu -- Harsh J
RE: why my test result on dfs short circuit read is slower?
Hi Arpit Gupta Yes, this way also confirms that short circuit read is enabled on my cluster. 13/02/16 14:07:34 DEBUG hdfs.DFSClient: Short circuit read is true 13/02/16 14:07:34 DEBUG hdfs.DFSClient: New BlockReaderLocal for file /mnt/DP_disk4/raymond/hdfs/data/current/subdir63/blk_-2736548898990727638 of size 134217728 startOffset 0 length 134217728 short circuit checksum false So , any possibility that other setting might impact short circuit read to has worse performance than read through datanode? Raymond Another way to check if short circuit read is configured correctly. As the user who is configured for short circuit read issue the following command on a node where you expect the data to be read locally. export HADOOP_ROOT_LOGGER=debug,console; hadoop dfs -cat /path/to/file_on_hdfs On the console you should see something like hdfs.DFSClient: New BlockReaderLocal for file This would confirm that short circuit read is happening. -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Feb 15, 2013, at 9:53 PM, Liu, Raymond raymond@intel.com wrote: Hi Harsh Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml. And I have double confirmed that local reads are performed, since there are no Error in datanode logs, and by watching lo network IO. If you want HBase to leverage the shortcircuit, the DN config dfs.block.local-path-access.user should be set to the user running HBase (i.e. hbase, for example), and the hbase-site.xml should have dfs.client.read.shortcircuit defined in all its RegionServers. Doing this wrong could result in performance penalty and some warn-logging, as local reads will be attempted but will begin to fail. On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond raymond@intel.com wrote: Hi I tried to use short circuit read to improve my hbase cluster MR scan performance. I have the following setting in hdfs-site.xml dfs.client.read.shortcircuit set to true dfs.block.local-path-access.user set to MR job runner. The cluster is 1+4 node and each data node have 16cpu/4HDD, with all hbase table major compact thus all data is local. I have hoped that the short circuit read will improve the performance. While the test result is that with short circuit read enabled, the performance actually dropped 10-15%. Say scan a 50G table cost around 100s instead of 90s. My hadoop version is 1.1.1, any idea on this? Thx! Best Regards, Raymond Liu -- Harsh J
RE: why my test result on dfs short circuit read is slower?
It seems to me that, with short circuit read enabled, the BlockReaderLocal read data in 512/4096 bytes unit(checksum check enabled/skiped) While when It go through datanode, the BlockSender.sendChunks will read and sent data in 64K bytes units? Is that true? And if so, won't it explain that read through datanode will be faster? Since it read data in bigger block size. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Saturday, February 16, 2013 2:23 PM To: user@hadoop.apache.org Subject: RE: why my test result on dfs short circuit read is slower? Hi Arpit Gupta Yes, this way also confirms that short circuit read is enabled on my cluster. 13/02/16 14:07:34 DEBUG hdfs.DFSClient: Short circuit read is true 13/02/16 14:07:34 DEBUG hdfs.DFSClient: New BlockReaderLocal for file /mnt/DP_disk4/raymond/hdfs/data/current/subdir63/blk_-2736548898990727 638 of size 134217728 startOffset 0 length 134217728 short circuit checksum false So , any possibility that other setting might impact short circuit read to has worse performance than read through datanode? Raymond Another way to check if short circuit read is configured correctly. As the user who is configured for short circuit read issue the following command on a node where you expect the data to be read locally. export HADOOP_ROOT_LOGGER=debug,console; hadoop dfs -cat /path/to/file_on_hdfs On the console you should see something like hdfs.DFSClient: New BlockReaderLocal for file This would confirm that short circuit read is happening. -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Feb 15, 2013, at 9:53 PM, Liu, Raymond raymond@intel.com wrote: Hi Harsh Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml. And I have double confirmed that local reads are performed, since there are no Error in datanode logs, and by watching lo network IO. If you want HBase to leverage the shortcircuit, the DN config dfs.block.local-path-access.user should be set to the user running HBase (i.e. hbase, for example), and the hbase-site.xml should have dfs.client.read.shortcircuit defined in all its RegionServers. Doing this wrong could result in performance penalty and some warn-logging, as local reads will be attempted but will begin to fail. On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond raymond@intel.com wrote: Hi I tried to use short circuit read to improve my hbase cluster MR scan performance. I have the following setting in hdfs-site.xml dfs.client.read.shortcircuit set to true dfs.block.local-path-access.user set to MR job runner. The cluster is 1+4 node and each data node have 16cpu/4HDD, with all hbase table major compact thus all data is local. I have hoped that the short circuit read will improve the performance. While the test result is that with short circuit read enabled, the performance actually dropped 10-15%. Say scan a 50G table cost around 100s instead of 90s. My hadoop version is 1.1.1, any idea on this? Thx! Best Regards, Raymond Liu -- Harsh J
RE: Multiple RS for serving one region
Is that also possible to control which disk the blocks are assigned? Say when there are multiple disks on one node, I wish the blocks belong to the local region distribute evenly across the disks. At present, it seems to that it is not. Though if you take non local regions' replica blocks in consider, overall blocks are distribute evenly, but when you scan a table, you don't touch those replica blocks. Thus, a non evenly distribute local region's block might lead to hotspot disk. There is a jira after all. It is HBASE-4755. On Tue, Jan 22, 2013 at 10:38 AM, Devaraj Das d...@hortonworks.com wrote: I'll raise a jira shortly (couldn't locate jiras that talk about this) and update here. But as it stands, I take it that people here finds this feature beneficial (although not many people chimed in yet). Yes, we'd probably need to work with Hadoop core to see this feature go through. It'll be great to hear from some facebook devs on this topic. On Tue, Jan 22, 2013 at 10:31 AM, Ted Yu yuzhih...@gmail.com wrote: The feature depends on hdfs support. Once we have that, we can implement this feature in HBase. Cheers On Tue, Jan 22, 2013 at 8:49 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: This sounds hugely useful to me and is one of those why doesn't HBase have that things that bugged me. Is there an issue to watch? http://search-hadoop.com/?q=region+failover+secondaryfc_project=HBa sefc_type=issuedoesn't find any. Thanks, Otis -- HBASE Performance Monitoring - http://sematext.com/spm/index.html On Mon, Jan 21, 2013 at 7:55 PM, Jonathan Hsieh j...@cloudera.com wrote: The main motivation is to maintain good performance on RS failovers. This is also tied with hdfs and its block placement policy. Let me explain as I understand it. If we control the hdfs block placement strategy we can write all blocks for a hfile (or for all hfiles related to a region) to the same set of data nodes. If the RS fails, they favor failover to a node that has a local copy of all the blocks. Today, when you write an hfile to hdfs, for each block the first replica goes to the local data node but the others get disbursed around the cluster randomly at a per block granularity. The problem here is that if the rs fails, the new rs that gets the responsibility for the region has to read files that are spread all over the cluster and with roughly 1/nth of the data local. This means that the recovered region is slower until a compaction localizes the data gain. They've gone in and modified hdfs and their hbase to take advantage of this idea. I believe the randomization policy is enforced per region -- if an rs serves 25 region, all the files within a each region are sent to the same set of secondary/tertiary nodes, but each region sends to a different set of secondary/tertiary nodes. Jon. On Mon, Jan 21, 2013 at 3:48 PM, Devaraj Das d...@hortonworks.com wrote: In 0.89-fb branch I stumbled upon stuff that indicated that there is a concept of secondary and tertiary regionserver. Could someone with more insights please shed some light on this? Might be useful to do the analysis on whether it makes sense for trunk.. Thanks Devaraj -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // j...@cloudera.com
RE: How is DataXceiver been used?
Hi Tariq Thanks, this blog is very informational! I roughly figure out the usage of xceiver. While the weird thing is even according to this blog, I should have far enough xceiever in my full table scan job. Especially if it count the number of storefile instead of real blk_ tfile on nativeFS. Then I only have around 24 storefiles for 24 region since I do major compact them. And then it exceeding 1024 limit during full table scan Might need more investigation into my case to check out what wrong with it. Hello Raymond, You might find this linkhttp://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/helpful. It explains the problem and solution in great detail. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Jan 17, 2013 at 12:14 PM, Liu, Raymond raymond@intel.comwrote: Hi I have a table with about 24 region on the one regionserver, and each region have about 20 block files on hdfs. The xceiverCount is set to 1024, I have thought that this is quite enough since at most 480 blocks will be opened. While when I do a MR job to scan the table, with 24 map task each open and scan a different region at the same time, it turn out that the DataXceiver is run out... I am a little bit puzzled, those blocks will only be read by one task, then shouldn't region server scan blocks one by one? And since there are 480 blocks at most, how can it use up dataXceiver? Best Regards, Raymond Liu
around 500 (CLOSE_WAIT) connection
Hi I have hadoop 1.1.1 and hbase 0.94.1. Around 300 region on each region server. Right after the cluster is started, before I do anything. There are already around 500 (CLOSE_WAIT) connection from regionserver process to Datanode process. Is that normal? Seems there are a lot of bugs related to (CLOSE_WAIT) issue in jira, but most of them 1 years ago. Best Regards, Raymond Liu
Trouble shooting process for a random lag region issue.
, but the diagnostic approach should help. http://hbase.apache.org/book.html#casestudies.slownode On 1/4/13 10:37 PM, Liu, Raymond raymond@intel.com wrote: Hi I encounter a weird lag behind map task issue here : I have a small hadoop/hbase cluster with 1 master node and 4 regionserver node all have 16 CPU with map and reduce slot set to 24. A few table is created with regions distributed on each region node evenly ( say 16 region for each region server). Also each region has almost the same number of kvs with very similar size. All table had major_compact done to ensure data locality I have a MR job which simply do local region scan in every map task ( so 16 map task for each regionserver node). By theory, every map task should finish within similar time. But the real case is that some regions on the same region server always lags behind a lot, say cost 150 ~250% of the other map tasks average times. If this is happen to a single region server for every table, I might doubt it is a disk issue or other reason that bring down the performance of this region server. But the weird thing is that, though with each single table, almost all the map task on the the same single regionserver is lag behind. But for different table, this lag behind regionserver is different! And the region and region size is distributed evenly which I double checked for a lot of times. ( I even try to set replica to 4 to ensure every node have a copy of local data) Say table 1, all map task on regionserver node 2 is slow. While for table 2, maybe all map task on regionserver node 3 is slow, and with table 1, it will always be regionserver node 2 which is slow regardless of cluster restart, and the slowest map task will always be the very same one. And it won't go away even I do major compact again. So, anyone could give me some clue on what reason might possible lead to this weird behavior? Any wild guess is welcome! (BTW. I don't encounter this issue a few days ago with the same table. While I do restart cluster and do a few changes upon config file during that period, But restore the config file don't help) Best Regards, Raymond Liu