On Apr 8, 2013, at 06:17 , JiHyoun Park wrote:
> Dear Matthieu
>
> What we need for the Yarn integration is just to include 2
> hdfs-deploy-related classes, which were developed at S4-25, in the s4
> core-deploy package.
>
> - org.apache.s4.deploy.HdfsFetcherModule.java
> - org.apache.s4.deploy.HdfsS4RFetcher.java
>
> And, simple modification at org.apache.s4.core.util.RemoteFileFetcher.java to
> be able to identify "hdfs" as one of s4r download sources.
>
> if ("hdfs".equalsIgnoreCase(scheme)){
> return new HdfsArchiveFetcher().fetch(uri);
> }
Hi Jihyoun,
adding Yarn/Hadoop dependencies in s4-core is something we want to avoid, so
that we don't force a specific version of Hadoop.
Instead, for S4 0.6, we could actually inject the fetchers through a custom
module. We'd ship the custom module separately from s4-core, avoiding the
dependency coupling issue.
Can you add a ticket for this? Thanks!
>
> I also would like to ask you one more favour.
> Can we have the "-shutdown" option again at org.apache.s4.tools.Deploy.java
> to avoid automatic shutdown of S4 application after deployment?
> I tried to use the "-testMode" option, which seemed to act just like the
> "-shutdown" option but my s4 application couldn't recognize the option.
If I understand correctly, you tried to port s4-yarn to 0.6.0?
Did you add the -testMode option in replacement of -shutdown=false here
https://github.com/apache/incubator-s4/blob/S4-25/subprojects/s4-yarn/src/main/java/org/apache/s4/tools/yarn/S4YarnClient.java#L387
?
Note that the S4 app being shut down without this option is actually a side
effect of the deployment/configuration s4 tool on Yarn: we need to prevent
system.exit statements since we are running in a contained environment.
Also, if you have a working port of S4-25 to S4 0.6, you could submit a patch
and we could integrate it. (if you are still iterating you can also fork the
project on github and share your code of the port there, so we can provide
feedback).
Thanks,
Matthieu
>
> Best Regards
> Jihyoun
>
>
> On Thu, Apr 4, 2013 at 5:26 PM, Matthieu Morel <[email protected]> wrote:
> Hi,
>
> Note that S4 0.5 was a complete refactoring, therefore its main objective was
> to provide a functional implementation. Thus there was room for improvements
> and the focus of the 0.6 release was on performance and usability.
>
> Most performance improvements in S4 0.6 come from:
> - adding metrics to identify bottlenecks
> - improving serialization and deserialization
> - minimizing buffer copies (and pressure on the garbage collector)
> - leveraging multithreading and async processing, notably by updating Netty
> pipelines
>
> Regards,
>
> Matthieu
>
>
>
>
> On Apr 4, 2013, at 07:01 , Siddharth wrote:
>
>> Hi - Can the development team highlight the exact solution/fix that made it
>> possible for 0.6 release to be so fast compared to the earlier release.
>>
>>
>>
>> Thanks in advance,
>>
>> Siddharth
>>
>>
>>
>> From: Matthieu Morel [mailto:[email protected]]
>> Sent: Wednesday, April 03, 2013 3:02 PM
>> To: [email protected]
>> Subject: Re: S4-0.6.0 and Hadoop Yarn
>>
>>
>>
>> On Apr 2, 2013, at 19:46 , Jeryl Cook wrote:
>>
>>
>>
>>
>> "handle 200K+ messages per sec" ,in one instance? or do you mean clustered?
>>
>>
>>
>> This is for processing small events injected into 1 stream on 1 node. By
>> using more streams and more nodes the overall throughput can get quite
>> higher.
>>
>>
>>
>> Note that this is a baseline with a basic PE graph (1 injector and 1 PE
>> prototype) and performance in practice will be impacted by the complexity of
>> the application and the nature of the processing, the hardware and allocated
>> resources, the size and complexity of messages etc..
>>
>>
>>
>> A benchmarking framework is included in the distribution, so you can
>> reproduce the experiments.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Matthieu
>>
>>
>>
>>
>>
>>>
>>>
>>> On Mon, Apr 1, 2013 at 10:42 PM, JiHyoun Park <[email protected]> wrote:
>>>
>>> Hi
>>>
>>> I am testing the newest release of S4.
>>> It's fantastic that the stream throughput of S4 0.6.0 has been improved to
>>> handle 200K+ messages per sec.!
>>> However, it seems that S4-25 branch - deploying S4 applications with Yarn -
>>> is not included in the 0.6.0 package yet.
>>> I already built a system to run S4 applications on Yarn and want to migrate
>>> its S4 framework from 0.5.0 to 0.6.0.
>>> How can I use the 'deploying S4 applications with Yarn' feature on S4 0.6.0?
>>>
>>> Best Regards
>>> Jihyoun
>>>
>>>
>>>
>>>
>>> --
>>> Jeryl Cook
>>> Founder & Chief Executive Officer
>>> VanitySoft, Inc.
>>> A Geo Business Intelligence Technology Consulting Firm
>>> www.vanity-soft.com
>>> www.linkedin.com/in/jerylcook
>>> Get answers to "who knew what, when, and where"... and everything in
>>> between.
>>>
>>> ____________________________________________________
>>> This message contains information which may be confidential and privileged.
>>> Unless you are the addressee (or authorized to receive for the addressee),
>>> you may not use, copy or disclose to anyone the message or any information
>>> contained in the message. If you have received the message in error, please
>>> advise the sender by reply e-mail [email protected], and delete
>>> the message.
>>>
>>
>>
>>
>
>