On Apr 8, 2013, at 11:31 , JiHyoun Park wrote:

Dear Matthieu

Yes, I am trying to port s4-yarn to 0.6.0.

-testMode option is what 0.6.0 has now in org.apache.s4.tools.Deploy.java,

            // Explicitly shutdown the JVM since Gradle leaves non-daemon 
threads running that delay the termination
            if (!deployArgs.testMode) {
                System.exit(0);
            }

just like the -shutdown option in the same class file of S4-25.

            // Explicitly shutdown the JVM since Gradle leaves non-daemon 
threads running that delay the termination
            if (deployArgs.shutdown) {
                System.exit(0);
            }

But the difference is
        @Parameter(names = "-testMode", description = "Special mode for 
regression testing", hidden = true)
        @Parameter(names = "-shutdown", description = "Shutdown JVM after 
deployment. Useful to avoid waiting for remaining long running threads from 
Gradle", arity = 1)

I tried to pass "-testMode=true" to Deploy.main()

        String [] argDeploy = {"-s4r=" + s4r_path_HDFS,
            "-cluster=" + cluster_name,
            "-appName=" + application_name,
            "-testMode=true"
        };
        Deploy.main(argDeploy);

but got an error.


Cannot parse arguments: class com.beust.jcommander.ParameterException -> Was 
passed main parameter 'true' but no main parameter was defined

With JCommander, the CLI parser we use, "-testMode" is a boolean parameter with 
arity of 0. So either you specify it : "-testMode" -> takes "true" value, or 
you don't (takes default "false" value).

The error you are reporting might come from that.

Hope this helps,

Matthieu



Usage

Usage: <main class> [options]

  Options:
    -a, -appClass                Full class name of the application class
                                 (extending App or AdapterApp)
  * -appName                     Name of S4 application.

  * -c, -cluster                 Logical name of the S4 cluster
    -debug                       Display debug information from the build system
                                 Default: false
    -gradleOpts                  gradle system properties (as in GRADLE_OPTS

                                 environment properties) passed to gradle 
scripts
                                 Default: []
    -help                        usage
                                 Default: false

    -modulesClasses, -emc, -mc   Fully qualified class names of custom modules
                                 Default: []
    -modulesURIs, -mu            URIs for fetching code of custom modules
                                 Default: []

    -namedStringParameters, -p   Comma-separated list of inline configuration
                                 parameters. Syntax: 
'-p=name1=value1,name2=value2 '
                                 Default: []

    -s4r                         URI to existing s4r file
    -timeout                     Connection timeout to Zookeeper, in ms
                                 Default: 10000
    -zk                          ZooKeeper connection string

                                 Default: localhost:2181



Best Regards
Jihyoun


On Mon, Apr 8, 2013 at 4:23 PM, Matthieu Morel 
<[email protected]<mailto:[email protected]>> wrote:

On Apr 8, 2013, at 06:17 , JiHyoun Park wrote:

Dear Matthieu

What we need for the Yarn integration is just to include 2 hdfs-deploy-related 
classes, which were developed at S4-25, in the s4 core-deploy package.

- org.apache.s4.deploy.HdfsFetcherModule.java
- org.apache.s4.deploy.HdfsS4RFetcher.java

And, simple modification at org.apache.s4.core.util.RemoteFileFetcher.java to 
be able to identify "hdfs" as one of s4r download sources.

        if ("hdfs".equalsIgnoreCase(scheme)){
            return new HdfsArchiveFetcher().fetch(uri);
        }


Hi Jihyoun,

adding Yarn/Hadoop dependencies in s4-core is something we want to avoid, so 
that we don't force a specific version of Hadoop.

Instead, for S4 0.6, we could actually inject the fetchers through a custom 
module. We'd ship the custom module separately from s4-core, avoiding the 
dependency coupling issue.

Can you add a ticket for this? Thanks!



I also would like to ask you one more favour.
Can we have the "-shutdown" option again at org.apache.s4.tools.Deploy.java to 
avoid automatic shutdown of S4 application after deployment?
I tried to use the "-testMode" option, which seemed to act just like the 
"-shutdown" option but my s4 application couldn't recognize the option.

If I understand correctly, you tried to port s4-yarn to 0.6.0?
Did you add the -testMode option in replacement of -shutdown=false here 
https://github.com/apache/incubator-s4/blob/S4-25/subprojects/s4-yarn/src/main/java/org/apache/s4/tools/yarn/S4YarnClient.java#L387
 ?

Note that the S4 app being shut down without this option is actually a side 
effect of the deployment/configuration s4 tool on Yarn: we need to prevent 
system.exit  statements since we are running in a contained environment.

Also, if you have a working port of S4-25 to S4 0.6, you could submit a patch 
and we could integrate it. (if you are still iterating you can also fork the 
project on github and share your code of the port there, so we can provide 
feedback).

Thanks,

Matthieu



Best Regards
Jihyoun


On Thu, Apr 4, 2013 at 5:26 PM, Matthieu Morel 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

Note that S4 0.5 was a complete refactoring, therefore its main objective was 
to provide a functional implementation. Thus there was room for improvements 
and the focus of the 0.6 release was on performance and usability.

Most performance improvements in S4 0.6 come from:
- adding metrics to identify bottlenecks
- improving serialization and deserialization
- minimizing buffer copies (and pressure on the garbage collector)
- leveraging multithreading and async processing, notably by updating Netty 
pipelines

Regards,

Matthieu




On Apr 4, 2013, at 07:01 , Siddharth wrote:

Hi - Can the development team highlight the exact solution/fix that made it 
possible for 0.6 release to be so fast compared to the earlier release.

Thanks in advance,
Siddharth

________________________________
From: Matthieu Morel [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, April 03, 2013 3:02 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: S4-0.6.0 and Hadoop Yarn

On Apr 2, 2013, at 19:46 , Jeryl Cook wrote:


"handle 200K+ messages per sec"  ,in one instance? or do you mean clustered?

This is for processing small events injected into 1 stream on 1 node. By using 
more streams and more nodes the overall throughput can get quite higher.

Note that this is a baseline with a basic PE graph (1 injector and 1 PE 
prototype) and performance in practice will be impacted by the complexity of 
the application and the nature of the processing, the hardware and allocated 
resources, the size and complexity of messages etc..

A benchmarking framework is included in the distribution, so you can reproduce 
the experiments.

Regards,

Matthieu



On Mon, Apr 1, 2013 at 10:42 PM, JiHyoun Park 
<[email protected]<mailto:[email protected]>> wrote:
Hi

I am testing the newest release of S4.
It's fantastic that the stream throughput of S4 0.6.0 has been improved to 
handle 200K+ messages per sec.!
However, it seems that S4-25 branch - deploying S4 applications with Yarn - is 
not included in the 0.6.0 package yet.
I already built a system to run S4 applications on Yarn and want to migrate its 
S4 framework from 0.5.0 to 0.6.0.
How can I use the 'deploying S4 applications with Yarn' feature on S4 0.6.0?

Best Regards
Jihyoun



--
Jeryl Cook
Founder & Chief Executive Officer
VanitySoft, Inc.
A Geo Business Intelligence Technology Consulting Firm
www.vanity-soft.com<http://www.vanity-soft.com/>
www.linkedin.com/in/jerylcook<http://www.linkedin.com/in/jerylcook>
Get answers to "who knew what, when, and where"... and everything in between.

____________________________________________________
This message contains information which may be confidential and privileged. 
Unless you are the addressee (or authorized to receive for the addressee), you 
may not use, copy or disclose to anyone the message or any information 
contained in the message. If you have received the message in error, please 
advise the sender by reply e-mail 
[email protected]<mailto:[email protected]>, and delete the 
message.






Reply via email to