Re: [VOTE] Slack Channel for BookKeeper

2017-07-27 Thread Leigh Stewart
+1

On Thu, Jul 27, 2017 at 11:02 AM, Sijie Guo  wrote:

> Start a vote thread for transferring the DL slack channel to BK.
>
> The propose is:
>
> - transfer the dl slack channel apachedistributedlog.slack.com to
> apachebookkeeper.slack.com
> - the owner will be BookKeeper PMC (priv...@bookkeeper.apache.org)
>
> One note:
>
> - the slack channel is for informal/immediate discussion. no decisions are
> made in slack channel. decision related discussions should be recorded in
> ASF (either mailing list, JIR or wiki)
>
> Please vote +1, 0, -1. The vote will be open for 72 hours.
>
> - Sijie
>


Re: [DISCUSS] Slack Channel for BookKeeper

2017-07-24 Thread Leigh Stewart
it

On Mon, Jul 24, 2017 at 6:50 PM, Jia Zhai  wrote:

>  It is great to have a slack channel. It make things more effective and
> smooth.
>
> On Tue, Jul 25, 2017 at 8:11 AM, Sijie Guo  wrote:
>
> > Hi all,
> >
> > What do you guys all think about having a dedicated slack channel for
> > informal discussion for the community? There are a handful of Apache
> > projects are doing that already, there are also ways to have a bot that
> > sends daily digest of the conversation to the mailing lists (to keep the
> > records in ASF infrastructure).
> >
> > As the followup steps for merging DL into BookKeeper, we are transferring
> > the DL slack channel to BookKeeper PMC. We can just make it a BK slack
> > channel, and have different channels under it for different discussions.
> >
> > Thoughts?
> >
> > - Sijie
> >
>


Re: [VOTE] Merge DistributedLog as the subproject of Apache BookKeeper

2017-06-12 Thread Leigh Stewart
+1

On Fri, Jun 9, 2017 at 2:48 PM, Uma gangumalla  wrote:

> +1 (binding)
>
> Regards,
> Uma
>
> On Thu, Jun 8, 2017 at 5:21 PM, Sijie Guo  wrote:
>
> > ( /cc bookkeeper dev@ and incubator general@ for awareness )
> >
> > Hi all,
> >
> > There was a joint discussion between BookKeeper PMC and DistributedLog
> PPMC
> > about moving the development of DistributedLog as part of Apache
> > BookKeeper. The reasons behind it are:
> >
> > First, DistributedLog is born as an extension to BookKeeper, to offer
> > continuous log streams as the service. The ledger API in bookkeeper is a
> > lower level API and has learning curves, while the log stream API in
> > distributedlog is a higher level API that simplifies the usage. The
> > combination of ledger API and stream API would offer a better
> > developer/user experience for applications.
> >
> > Secondly, using ledgers to build continuous (re-openable) log stream is a
> > very common pattern for BookKeeper use cases. We did this for HDFS
> namenode
> > journal, for Hedwig, for DistributedLog, and for Pulsar. The same pattern
> > has been implemented again and again. Merge DistributedLog (also
> > ManagedLedger in Pulsar) with BookKeeper will consolidate all the
> > development efforts around this common 'log stream' pattern.
> >
> > Thirdly, the 'log' stream abstraction is a good abstraction for both
> > messaging and streaming. Internally at BookKeeper, there are a few places
> > that can use such 'messaging' facility to improve bookkeeper itself. the
> > log stream in DistributedLog can be used internally at bookkeeper for
> > streaming changes as well.
> >
> > We choose merging DistributedLog as subproject rather than modules. It
> is a
> > softer starting point to avoid disrupting the folks who are depending on
> > the ledger api alone. The BookKeeper PMC and DistributedLog PPMC has
> > achieved initial consensus on this merge. There is an official VOTE
> ongoing
> > in bookkeeper PMC. We'd like to bring this to the distributedlog
> community
> > for a community vote following the guidelines here
> > .
> >
> > Please vote +1 if in favor of merging DistributedLog to BookKeeper, and
> -1
> > if not. The vote will be open until Tuesday 13rd June, 18:00 PST.
> >
> > - Sijie
> >
>


Re: DistributedLog Podling Report Draft - May 2017

2017-05-01 Thread Leigh Stewart
+1

On Mon, May 1, 2017 at 9:43 AM, Sijie Guo <guosi...@gmail.com> wrote:

> Hi, all,
>
> Here is the draft of podling report for May 2017. Please help review it.
>
> =
>
> DistributedLog
>
> DistributedLog is a high-performance replicated log service. It offers
> durability, replication and strong consistency, which provides a
> fundamental building block for building reliable distributed systems.
> DistributedLog has been incubating since 2016-06-24.
>
> Three most important issues to address in the move towards graduation:
>
> 1.Continue to grow the community, and increase diversity of community.
> 2.Improve documentation, including documentation of project and processes.
> 3.Successful releases.
>
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
> of?
>
> None.
>
> How has the community developed since the last report?
>
> 1. community
>
> - Sijie gave a talk about DistributedLog at Strata+Hadoop San Jose.
> - The DistributedLog paper is accepted at ICDE 2017.
> - Leigh Stewart gave a presentation about DistributedLog at ICDE.
>
> 2. 44 people subscribed to dev mail list,
> 8 improvement proposal in progress,
> and 72 open issues.
>
> How has the project developed since the last report?
>
> - We have released the first Apache version : 0.4.0-incubating on April 25.
> - We start the work on release 0.5.0-incubating.
>
> - Sijie
>


Re: [DISCUSS] DP-8: Symlinked Log

2017-04-18 Thread Leigh Stewart
+ Dan

On Tue, Apr 18, 2017 at 1:04 PM, Sijie Guo  wrote:

> Hi all,
>
> I created a proposal for supporting symlinks in Dlog, in order to migrate
> a namespace from flat namespace to a hierarchical namespace.
>
> https://cwiki.apache.org/confluence/display/DL/DP-8+-+Symlinked+Log
>
> Please take a look and let me know your thoughts.
>
> - Sijie
>
>


Re: [VOTE] Release 0.4.0, release candidate #3

2017-04-02 Thread Leigh Stewart
+1

On Mar 31, 2017 11:03 AM, "Yiming Zang"  wrote:

> +1
>
> Same as Jia Zhai, build succeed after adding
> "src/main/resources/DISCLAIMER.bin.txt"
> at line 232 of pom.xml
>
> MD5 and SHA looks good.
>
> dlog tool can run without any issue from the binary.
>
>
> On Thu, Mar 30, 2017 at 5:27 PM, Jia Zhai  wrote:
>
> > +1 binding.
> >
> > "mvn apache-rat:check package findbugs:check -DskipTests" execute
> > successfully, after exclude 1 file for rat check, by adding this line at
> > line232 of pom.xml
> > ```
> > src/main/resources/DISCLAIMER.bin.txt
> > ```
> >
> > On Thu, Mar 30, 2017 at 12:46 AM, Sijie Guo  wrote:
> >
> > > I think apache-rat:check will fail because of
> > > https://issues.apache.org/jira/browse/DL-195. But I don't think it is
> a
> > > blocker for the release.
> > >
> > > - Sijie
> > >
> > > On Tue, Mar 28, 2017 at 3:29 PM, Sijie Guo  wrote:
> > >
> > > > Hi all,
> > > >
> > > > Please review and vote on the release candidate #3 for the version
> > 0.4.0,
> > > > as follows:
> > > >
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > >
> > > > * JIRA release notes [1],
> > > > * the official Apache source release to be deployed to
> > > dist.apache.org
> > > >  [2],
> > > > * all artifacts to be deployed to the Maven Central Repository
> [3],
> > > > * source code tag "v0.4.0-incubating-RC1_2.11" (for scala 2.11)
> and
> > > > "v0.4.0-incubating-RC1_2.10" (for scala 2.10) [4][5],
> > > > * website pull request listing the release [6] and publishing the
> > API
> > > > reference manual.
> > > >
> > > > A simple instruction for validation the source and binary packages.
> > > >
> > > > - source package: building the package with "*mvn clean
> > apache-rat:check
> > > > package findbugs:check -DskipTests*"
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PPMC affirmative votes.
> > > >
> > > > Thanks,
> > > > Sijie
> > > >
> > > > [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > > version=12337980==12320620
> > > > [2] https://dist.apache.org/repos/dist/dev/incubator/
> > > distributedlog/0.4.0-
> > > > incubating-RC3/
> > > > [3] https://repository.apache.org/content/repositories/
> > > > orgapachedistributedlog-1006/
> > > > [4] https://github.com/apache/incubator-distributedlog/tree/
> > > > v0.4.0-incubating-RC3_2.11
> > > > [5] https://github.com/apache/incubator-distributedlog/tree/
> > > > v0.4.0-incubating-RC3_2.10
> > > > [6] https://github.com/apache/incubator-distributedlog/pull/109
> > > >
> > >
> >
>


Re: [VOTE] Release 0.4.0, release candidate #2

2017-01-17 Thread Leigh Stewart
+1

On Mon, Jan 16, 2017 at 8:13 AM, Jon Derrick 
wrote:

> +1
>
> LGTM. compiled the source packages and ran dbench. the license files look
> good.
>
> - jd
>
> On Tue, Jan 10, 2017 at 11:56 PM, Sijie Guo  wrote:
>
> > Hi all,
> >
> > Please review and vote on the release candidate #2 for the version 0.4.0,
> > as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> >
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to
> dist.apache.org
> >  [2],
> > * all artifacts to be deployed to the Maven Central Repository
> [3][4],
> > * source code tag "v0.4.0-incubating-RC1_2.11" (for scala 2.11) and
> > "v0.4.0-incubating-RC1_2.10" (for scala 2.10) [5][6],
> > * website pull request listing the release [7] and publishing the API
> > reference manual.
> >
> > A simple instruction for validation the source and binary packages.
> >
> > - source package: building the package with "*mvn clean apache-rat:check
> > package findbugs:check -DskipTests*"
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PPMC affirmative votes.
> >
> > Thanks,
> > Sijie
> >
> > [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > projectId=12320620=12337980
> > [2]
> > https://dist.apache.org/repos/dist/dev/incubator/distributedlog/0.4.0-
> > incubating-RC2/
> > [3]
> > https://repository.apache.org/content/repositories/
> > orgapachedistributedlog-1003/
> > [4]
> > https://repository.apache.org/content/repositories/
> > orgapachedistributedlog-1004/
> > [5]
> > https://github.com/apache/incubator-distributedlog/tree/
> > v0.4.0-incubating-RC1_2.11
> > [6]
> > https://github.com/apache/incubator-distributedlog/tree/
> > v0.4.0-incubating-RC1_2.10
> > [7] https://github.com/apache/incubator-distributedlog/pull/109
> >
>
>
>
> --
> - jderrick
>


[jira] [Commented] (DL-102) Add routing service to write proxy server side

2016-12-21 Thread Leigh Stewart (JIRA)

[ 
https://issues.apache.org/jira/browse/DL-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767568#comment-15767568
 ] 

Leigh Stewart commented on DL-102:
--

Serverside depends on TwitterRegionResolver

So I guess we'll need to use the generic region resolver here.

> Add routing service to write proxy server side
> --
>
> Key: DL-102
> URL: https://issues.apache.org/jira/browse/DL-102
> Project: DistributedLog
>  Issue Type: Improvement
>Reporter: Sijie Guo
>    Assignee: Leigh Stewart
>
> this change is to add getOwner rpc in write proxy. so we can change the 
> client side to get owner from write proxy first for
>  routing service. in this way, we can start experiementing any resource 
> placement algorithms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Is there a REST API for Write and Read proxies?

2016-12-21 Thread Leigh Stewart
The challenge is the protocol uses a redirection mechanism so theres a
protocol beyond thrift/transport.

As Jay says we plan to make improvements in this area.

On Wed, Dec 21, 2016 at 8:53 AM, Jay Juma  wrote:

> Hi Asko,
>
> I don't think there is a REST api available in the proxy service. The API
> seems to be thrift-rpc based. I found there is a JIRA to support gPRC wire
> protocol. It should not be difficult to add a REST api.
>
> - Jay
>
> On Wed, Dec 21, 2016 at 4:29 AM, Asko Kauppi 
> wrote:
>
> > I’m reading http://distributedlog.incubator.apache.org/docs/
> > latest/user_guide/api/proxy.html  > incubator.apache.org/docs/latest/user_guide/api/proxy.html>
> >
> > Ideally, I wouldn’t need to use a library to talk to a proxy service,
> > right? Is there documentation on how to access the proxies as REST
> > endpoints / are they such?
> >
> > My preferred environment is Scala and akka-http.
> >
> > Asko Kauppi
> > Zalando Tech Helsinki
> >
> >
>


[jira] [Updated] (DL-97) Remove unused methods in BKLogHandler

2016-12-21 Thread Leigh Stewart (JIRA)

 [ 
https://issues.apache.org/jira/browse/DL-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leigh Stewart updated DL-97:

Assignee: Sijie Guo  (was: Leigh Stewart)

> Remove unused methods in BKLogHandler
> -
>
> Key: DL-97
> URL: https://issues.apache.org/jira/browse/DL-97
> Project: DistributedLog
>  Issue Type: Sub-task
>  Components: distributedlog-core
>Reporter: Sijie Guo
>Assignee: Sijie Guo
> Fix For: 0.4.0
>
>
> Remove unused methods in BKLogHandler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: DistributedLog Podling Report Draft - December 2016

2016-12-07 Thread Leigh Stewart
#shipit

On Wed, Dec 7, 2016 at 7:28 PM, Sijie Guo 
wrote:

> If no objections, I will submit the proposal tonight.
>
> - Sijie
>
> On Tue, Dec 6, 2016 at 7:04 PM, Khurrum Nasim 
> wrote:
>
> > +1 also excited about the first release.
> >
> > - KN
> >
> > On Tue, Dec 6, 2016 at 5:51 PM, Jia Zhai  wrote:
> >
> > > +1 LGTM.
> > > looking foward for the first release.
> > >
> > > On Wed, Dec 7, 2016 at 12:19 AM, Sijie Guo  wrote:
> > >
> > > > Hi, all,
> > > >
> > > > Here is the draft of podling report. Please help review it.
> > > >
> > > > =
> > > >
> > > > DistributedLog
> > > >
> > > > DistributedLog is a high-performance replicated log service. It
> offers
> > > > durability, replication and strong consistency, which provides a
> > > > fundamental building block for building reliable distributed systems.
> > > > DistributedLog has been incubating since 2016-06-24.
> > > >
> > > > Three most important issues to address in the move towards
> graduation:
> > > >
> > > > 1.Continue to grow the community, and increase diversity of
> community.
> > > > 2.Improve documentation, including documentation of project and
> > > processes.
> > > > 3.Successful releases.
> > > >
> > > > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> > > aware
> > > > of?
> > > > None
> > > >
> > > > How has the community developed since the last report?
> > > > 1. Increase in contributions from community.
> > > >- 20 created and 9 resolved issues in community JIRA in November.
> > > > 2. Lots of engagements on feature proposals.
> > > > 3. Increased traffic on the mailing list, in particular, due to
> > > committers
> > > > engaging more actively with contributors.
> > > >- we have 36 people subscribed mail list.
> > > >- 192 messages to distributedlog mail list in November.
> > > >
> > > > How has the project developed since the last report?
> > > >
> > > > The community is working on the first release. The first release will
> > be
> > > in
> > > > mid December.
> > > >
> > > > - Sijie
> > > >
> > >
> >
>


[jira] [Commented] (DL-10) Please document how to run distributedlog-benchmark #26

2016-11-14 Thread Leigh Stewart (JIRA)

[ 
https://issues.apache.org/jira/browse/DL-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664357#comment-15664357
 ] 

Leigh Stewart commented on DL-10:
-

Pull request: https://github.com/apache/incubator-distributedlog/pull/44

> Please document how to run distributedlog-benchmark #26
> ---
>
> Key: DL-10
> URL: https://issues.apache.org/jira/browse/DL-10
> Project: DistributedLog
>  Issue Type: New Feature
> Environment: Cloned from 
> https://github.com/twitter/distributedlog/issues/26
>Reporter: Sijie Guo
>    Assignee: Leigh Stewart
> Fix For: 0.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DL-35) Document on how to setup a global DL cluster

2016-11-14 Thread Leigh Stewart (JIRA)

 [ 
https://issues.apache.org/jira/browse/DL-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leigh Stewart resolved DL-35.
-
Resolution: Fixed

> Document on how to setup a global DL cluster
> 
>
> Key: DL-35
> URL: https://issues.apache.org/jira/browse/DL-35
> Project: DistributedLog
>  Issue Type: Task
>  Components: documentation
>Reporter: Sijie Guo
>    Assignee: Leigh Stewart
> Fix For: 0.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DL-35) Document on how to setup a global DL cluster

2016-11-14 Thread Leigh Stewart (JIRA)

[ 
https://issues.apache.org/jira/browse/DL-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664154#comment-15664154
 ] 

Leigh Stewart commented on DL-35:
-

Checked in, still need to deploy the latest site.

> Document on how to setup a global DL cluster
> 
>
> Key: DL-35
> URL: https://issues.apache.org/jira/browse/DL-35
> Project: DistributedLog
>  Issue Type: Task
>  Components: documentation
>Reporter: Sijie Guo
>    Assignee: Leigh Stewart
> Fix For: 0.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DL-35) Document on how to setup a global DL cluster

2016-11-14 Thread Leigh Stewart (JIRA)

 [ 
https://issues.apache.org/jira/browse/DL-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leigh Stewart closed DL-35.
---

> Document on how to setup a global DL cluster
> 
>
> Key: DL-35
> URL: https://issues.apache.org/jira/browse/DL-35
> Project: DistributedLog
>  Issue Type: Task
>  Components: documentation
>Reporter: Sijie Guo
>    Assignee: Leigh Stewart
> Fix For: 0.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Use DL stream to store offsets?

2016-11-02 Thread Leigh Stewart
We have in fact built something like this. No plans as yet to release, but
I think we would like to eventually.

On Wed, Nov 2, 2016 at 2:43 AM, Khurrum Nasim 
wrote:

> As part of implementing the kafka subscriber interface, I am wondering is
> there anyone use DL stream for storing the offsets?
>
> For example, if I have N streams (0..N-1), I need to track the read offset
> for each stream and store them somewhere. I can probably use other external
> services (like any key/value store) to store the offset. But it would
> introduce extra dependencies. I am thinking if I have a map of  offset> and periodically flush the map into a separate stream (let's say
> __offset_ stream). With proper truncation/checkpoint mechanism, it would be
> very fast.
>
> This use case here is a very standard replicated state machine. I am also
> wondering do you guys think of providing some common library on
> distributedlog to simply implementing state machines.
>
> - KN
>


Re: hundreds of millions of streams?

2016-10-28 Thread Leigh Stewart
Got it. We probably can't support that scale at this time.
Curious: do you resort to sharing streams among objects with systems that
don't support 100s millions of streams? (i.e. partitioning objects across
streams?)

On Fri, Oct 28, 2016 at 8:24 AM, Poule Dodue <pouledo...@hotmail.com> wrote:

> yes aka ES/CQRS
>
> some links:
>
> https://msdn.microsoft.com/en-us/library/jj554200.aspx
> http://williamverdolini.github.io/2014/08/11/cqrses-architecture/
> http://docs.geteventstore.com/introduction/3.9.0/event-sourcing-basics/
>
> it needs lot of streams to basically replay events for any entity on a
> system.
>
> example: i could replay events for all changes that happened in 1 Cart of
> 1 User:
>
>
> (read events from stream "cart-of-user-233293111" ):
>
> 1- added item X
> 2- deleted item X
> 3- added item Y
> 
>
> by replaying that stream, I can rebuild a user's cart state
>
>
> > Le 28 oct. 2016 à 10:13, Leigh Stewart <lstew...@twitter.com.INVALID> a
> écrit :
> >
> > Poule- would you mind sharing some information on Event Sourcing? Are you
> > referring to something like
> > http://martinfowler.com/eaaDev/EventSourcing.html ?
> >
> > On Fri, Oct 28, 2016 at 7:11 AM, Leigh Stewart <lstew...@twitter.com>
> wrote:
> >
> >> DL is not able to handle 100s of millions of streams. 10^5-106 is
> probably
> >> ok.
> >> ZK is probably the biggest challenge (we are looking at ways to
> eliminate
> >> this as we would like to scale to 10^6-10^7 in the not too distant
> future),
> >> but 100s of millions is so far beyond what we've worked with there would
> >> likely be other scaling challenges on the way to that point.
> >>
> >> On Fri, Oct 28, 2016 at 5:56 AM, Poule Dodue <pouledo...@hotmail.com>
> >> wrote:
> >>
> >>> In Event Sourcing, we need to have 1 stream per entity/aggregate so for
> >>> a typical prod system it means we need hundreds of millions of streams.
> >>>
> >>> Is DL able to handle that or it is limited to, say, few hundreds
> >>> thousands of streams?
> >>>
> >>>
> >>>
> >>
>
>


Re: [Discuss] Transaction Support

2016-10-28 Thread Leigh Stewart
Interesting proposal. A couple quick notes while you continue to flesh this
out.

a. just to be sure - does this eliminate the need to save seqno with
checkpoint?

b. i.e. another way to describe this kind of improvement is "support
records (atomic writes) larger than 1MB", iiuc. the advantage being it
avoids the baggage of transactions. disadvantages include inability to do
cross stream transactions, and flexibility (interleaving, etc) (are there
others?).

c. proxy use case is for supporting multiple writers - have you thought
about how this would work with multiple writers?

Thanks!


On Tue, Oct 18, 2016 at 6:45 PM, Sijie Guo 
wrote:

> Sound good to me. look forward to the detailed proposal.
>
> (I don't mind the format if it makes things easier to you)
>
> Sijie
>
> On Friday, October 14, 2016, Xi Liu  wrote:
>
> > Thank you, Sijie
> >
> > We have some internal discussions to sort out some details. We are ready
> to
> > collaborate with the community for adding the transaction support in DL.
> > We'd like to share more.
> >
> > I created a proposal wiki here -
> > https://cwiki.apache.org/confluence/display/DL/DP-1+-+
> > DistributedLog+Transaction+Support
> >
> > (I followed KIP format and named it as DP (DistributedLog Proposal - DP
> is
> > also short for Dynamic Programming). I don't know if you guys like this
> > name or not. Feel free to change it :D)
> >
> > I basically put my initial email as the content there so far. Once we
> > finished our final discussion, I will update with more details. At the
> same
> > time, any comments are welcome.
> >
> > - Xi
> >
> >
> >
> > On Sat, Oct 8, 2016 at 6:58 AM, Sijie Guo  >
> > wrote:
> >
> > > Xi,
> > >
> > > I just granted you the edit permission.
> > >
> > > - Sijie
> > >
> > > On Fri, Oct 7, 2016 at 10:34 AM, Xi Liu  > > wrote:
> > >
> > > > I still can not edit the wiki. Can any of the pmc members grant me
> the
> > > > permissions?
> > > >
> > > > - Xi
> > > >
> > > > On Sat, Sep 17, 2016 at 10:35 PM, Xi Liu  > > wrote:
> > > >
> > > > > Sijie,
> > > > >
> > > > > I attempted to create a wiki page under that space. I found that I
> am
> > > not
> > > > > authorized with edit permission.
> > > > >
> > > > > Can any of the committers grant me the wiki edit permission? My
> > account
> > > > is
> > > > > "xi.liu.ant".
> > > > >
> > > > > - Xi
> > > > >
> > > > >
> > > > > On Tue, Sep 13, 2016 at 9:26 AM, Sijie Guo  > > wrote:
> > > > >
> > > > >> This sounds interesting ... I will take a closer look and give my
> > > > comments
> > > > >> later.
> > > > >>
> > > > >> At the same time, do you mind creating a wiki page to put your
> idea
> > > > there?
> > > > >> You can add your wiki page under
> > > > >> https://cwiki.apache.org/confluence/display/DL/Project+Proposals
> > > > >>
> > > > >> You might need to ask in the dev list to grant the wiki edit
> > > permissions
> > > > >> to
> > > > >> you once you have a wiki account.
> > > > >>
> > > > >> - Sijie
> > > > >>
> > > > >>
> > > > >> On Mon, Sep 12, 2016 at 2:20 AM, Xi Liu  > > wrote:
> > > > >>
> > > > >> > Hello,
> > > > >> >
> > > > >> > I asked the transaction support in distributedlog user group two
> > > > months
> > > > >> > ago. I want to raise this up again, as we are looking for using
> > > > >> > distributedlog for building a transactional data service. It is
> a
> > > > major
> > > > >> > feature that is missing in distributedlog. We have some ideas to
> > add
> > > > >> this
> > > > >> > to distributedlog and want to know if they make sense or not. If
> > > they
> > > > >> are
> > > > >> > good, we'd like to contribute and develop with the community.
> > > > >> >
> > > > >> > Here are the thoughts:
> > > > >> >
> > > > >> > -
> > > > >> >
> > > > >> > From our understanding, DL can provide "at-least-once" delivery
> > > > semantic
> > > > >> > (if not, please correct me) but not "exactly-once" delivery
> > > semantic.
> > > > >> That
> > > > >> > means that a message can be delivered one or more times if the
> > > reader
> > > > >> > doesn't handle duplicates.
> > > > >> >
> > > > >> > The duplicates come from two places, one is at writer side (this
> > > > assumes
> > > > >> > using write proxy not the core library), while the other one is
> at
> > > > >> reader
> > > > >> > side.
> > > > >> >
> > > > >> > - writer side: if the client attempts to write a record to the
> > write
> > > > >> > proxies and gets a network error (e.g timeouts) then retries,
> the
> > > > >> retrying
> > > > >> > will potentially result in duplicates.
> > > > >> > - reader side:if the reader reads a message from a stream and
> then
> > > > >> crashes,
> > > > >> > when the reader restarts it would restart from last known
> position
> > 

Re: hundreds of millions of streams?

2016-10-28 Thread Leigh Stewart
Poule- would you mind sharing some information on Event Sourcing? Are you
referring to something like
http://martinfowler.com/eaaDev/EventSourcing.html ?

On Fri, Oct 28, 2016 at 7:11 AM, Leigh Stewart <lstew...@twitter.com> wrote:

> DL is not able to handle 100s of millions of streams. 10^5-106 is probably
> ok.
> ZK is probably the biggest challenge (we are looking at ways to eliminate
> this as we would like to scale to 10^6-10^7 in the not too distant future),
> but 100s of millions is so far beyond what we've worked with there would
> likely be other scaling challenges on the way to that point.
>
> On Fri, Oct 28, 2016 at 5:56 AM, Poule Dodue <pouledo...@hotmail.com>
> wrote:
>
>> In Event Sourcing, we need to have 1 stream per entity/aggregate so for
>> a typical prod system it means we need hundreds of millions of streams.
>>
>> Is DL able to handle that or it is limited to, say, few hundreds
>> thousands of streams?
>>
>>
>>
>


Re: hundreds of millions of streams?

2016-10-28 Thread Leigh Stewart
DL is not able to handle 100s of millions of streams. 10^5-106 is probably
ok.
ZK is probably the biggest challenge (we are looking at ways to eliminate
this as we would like to scale to 10^6-10^7 in the not too distant future),
but 100s of millions is so far beyond what we've worked with there would
likely be other scaling challenges on the way to that point.

On Fri, Oct 28, 2016 at 5:56 AM, Poule Dodue  wrote:

> In Event Sourcing, we need to have 1 stream per entity/aggregate so for
> a typical prod system it means we need hundreds of millions of streams.
>
> Is DL able to handle that or it is limited to, say, few hundreds thousands
> of streams?
>
>
>


Re: Proxy Client - Batch Ordering / Commit

2016-10-05 Thread Leigh Stewart
>
> So, my basic question is if this is currently possible in the proxy? I
> don't believe it gives these guarantees as it stands today, but I am not
> 100% of how all of the futures in the code handle failures.
>

As long as you use this method

to write this is possible.

The writeBulk is not atomic and we will probably deprecated it at some
point.

If not, where in the code would be the relevant places to add the ability
> to do this, and would the project be interested in a pull request?


Does the example linked above meet your requirements?

Thx

On Tue, Oct 4, 2016 at 12:39 PM, Cameron Hatfield  wrote:

> I have a question about the Proxy Client. Basically, for our use cases, we
> want to guarantee ordering at the key level, irrespective of the ordering
> of the partition it may be assigned to as a whole. Due to the source of the
> data (HBase Replication), we cannot guarantee that a single partition will
> be owned for writes by the same client. This means the proxy client works
> well (since we don't care which proxy owns the partition we are writing
> to).
>
>
> However, the guarantees we need when writing a batch consists of:
> Definition of a Batch: The set of records sent to the writeBatch endpoint
> on the proxy
>
> 1. Batch success: If the client receives a success from the proxy, then
> that batch is successfully written
>
> 2. Inter-Batch ordering : Once a batch has been written successfully by the
> client, when another batch is written, it will be guaranteed to be ordered
> after the last batch (if it is the same stream).
>
> 3. Intra-Batch ordering: Within a batch of writes, the records will be
> committed in order
>
> 4. Intra-Batch failure ordering: If an individual record fails to write
> within a batch, all records after that record will not be written.
>
> 5. Batch Commit: Guarantee that if a batch returns a success, it will be
> written
>
> 6. Read-after-write: Once a batch is committed, within a limited time-frame
> it will be able to be read. This is required in the case of failure, so
> that the client can see what actually got committed. I believe the
> time-frame part could be removed if the client can send in the same
> sequence number that was written previously, since it would then fail and
> we would know that a read needs to occur.
>
>
> So, my basic question is if this is currently possible in the proxy? I
> don't believe it gives these guarantees as it stands today, but I am not
> 100% of how all of the futures in the code handle failures.
> If not, where in the code would be the relevant places to add the ability
> to do this, and would the project be interested in a pull request?
>
>
> Thanks,
> Cameron
>


Re: Question about replication repair

2016-09-30 Thread Leigh Stewart
Streams are segmented across sets of bookies called ensembles.

http://distributedlog.incubator.apache.org/docs/latest/basics/introduction#log-segments

So loss of 3 bookies (assuming ensemble size 3) would never result in loss
of an entire stream.

Regardless, losing all replicas is non-recoverable if it happens all at
once. If we lose all copies there's nothing to repair from.

This scenario should be rare though. In production it is advised to run the
BookKeeper ReReplicator, which continuously monitors for under replicated
ledgers, and repairs the quorum continuously.

As long as there is enough time to copy data between bookie deaths, you'd
be fine.


On Thu, Sep 29, 2016 at 9:57 PM, Jay Juma  wrote:

> Can anyone share me more details about how does DL repair replicas?
>
> For example, if a stream is stored in bookie A, B, C and C is gone forever.
> How does DL handle it?
>
> - Jay
>


Re: tutorial about setting up a global replicated log

2016-09-30 Thread Leigh Stewart
I am currently working on this (busy production week) - follow here:
https://issues.apache.org/jira/browse/DL-35

On Thu, Sep 29, 2016 at 9:58 PM, Jay Juma  wrote:

> Can anyone help on this?
>
> - Jay
>
> On Sun, Sep 18, 2016 at 11:17 AM, Jay Juma  wrote:
>
> > Hello,
> >
> > Do you have a tutorial of setting up a global replicated log cluster?
> >
> > Thanks,
> > Jay
> >
>


Re: user mail list

2016-09-30 Thread Leigh Stewart
There's an active pull request from Sijie which contains instructions for
how to build the website:

https://github.com/apache/incubator-distributedlog/pull/30

On Thu, Sep 29, 2016 at 9:50 PM, Khurrum Nasim 
wrote:

> I confirmed that I can subscribe to the user@d.i.a.o list. Thanks Chris.
>
> I think we should also update the website to add the new user list. Anyone
> knows how to update the website? I can send out a pull request for it.
>
> - KN
>
> On Wed, Sep 21, 2016 at 5:44 AM, Chris Nauroth 
> wrote:
>
> > I just tried to submit the infrastructure request for a user@ list, but
> > it told me that the list already exists.  I guess another mentor beat me
> to
> > it.  You can subscribe by sending an email to
> > user-subscr...@distributedlog.incubator.apache.org.
> >
> > --Chris Nauroth
> >
> > On 9/17/16, 8:18 PM, "Jay Juma"  wrote:
> >
> > +1 for a separated user mail list.
> >
> > It is convient for newbies like me. And most of the development
> > discussion
> > are irrelevant to me.
> >
> > - Jay
> >
> > On Sun, Sep 18, 2016 at 10:12 AM, Khurrum Nasim <
> > khurrumnas...@gmail.com>
> > wrote:
> >
> > > Chris & Sijie,
> > >
> > >
> > > I felt it is worth separating the user and dev discussion into two
> > mail
> > > lists. As from subscribers' perspective, I can easily setup
> > filtering rules
> > > on two different mail lists and get all the information that I
> > really care
> > > about.
> > >
> > > I'd prefer setting up a separated user list if we can.
> > >
> > > KN
> > >
> > > On Tue, Sep 13, 2016 at 1:07 PM, Chris Nauroth <
> > cnaur...@hortonworks.com>
> > > wrote:
> > >
> > > > Typically podlings start out using the dev@ list to field user
> > questions
> > > > and then split out a separate user@ list later, only if traffic
> > from
> > > user
> > > > questions is sufficient to warrant the split.  There is no rule
> > about
> > > this
> > > > though.  If your community prefers a separate user@ list even
> > now, then
> > > > that’s fine, and mentors can help set that up.
> > > >
> > > > --Chris Nauroth
> > > >
> > > > On 9/12/16, 6:30 PM, "Sijie Guo"  wrote:
> > > >
> > > > Any suggestions from mentors? Maybe it is a good time to have
> > a user
> > > > mail
> > > > list?
> > > >
> > > > - Sijie
> > > >
> > > > On Sat, Sep 10, 2016 at 11:21 PM, Sijie Guo <
> si...@apache.org>
> > > wrote:
> > > >
> > > > > Jay,
> > > > >
> > > > > Thank you for asking. Unfortunately we don't have a user
> > mail list
> > > > yet.
> > > > >
> > > > > Since you are asking, it might be a good chance to have
> one.
> > I need
> > > > to ask
> > > > > the podling mentors to see if we can create one.
> > > > >
> > > > > Thank you,
> > > > > Sijie
> > > > >
> > > > > On Sat, Sep 10, 2016 at 2:52 PM, Jay Juma <
> > jayk.j...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >> Is there a user mail list that I can join? I felt it is a
> > bit
> > > weird
> > > > to ask
> > > > >> some simple user questions in a dev mail list.
> > > > >>
> > > > >> Thanks,
> > > > >> Jay
> > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
>