[ 
https://issues.apache.org/jira/browse/KAFKA-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997207#comment-16997207
 ] 

adam Retter edited comment on KAFKA-9148 at 12/16/19 11:59 AM:
---------------------------------------------------------------

Sorry to hear about your troubles, I would like to put forward a case for not 
forking RocksDB:
{quote}We can avoid passing sudden breaking changes on to our users, such 
removal of methods with no deprecation period (see discussion on KAFKA-8897)
{quote}
>From KAFKA-8897:
{quote}They never deprecated it. CompactionOptionsFIFO#setTtl() is there in 
5.18.3, but is removed in 6.0.
{quote}
If we were following SemVer 2, then it is fine/expected to make breaking API 
changes in Major version changes. Regardless, the API of RocksJava closely 
follows the C++ API, if it was not deprecated in the C++ API, then it would not 
have been deprecated in the Java API.
{quote}Support for some architectures does not exist in all RocksDB versions, 
making Streams completely unusable for some users until we can upgrade the 
rocksdb dependency to one that supports their specific case
{quote}
As well as x86 and x86_64, we have added pcc64le and aarch64 recently, and 
these are both shipped since 6.4.6 (and ppc64le actually for much longer). We 
have very recently also added support for distributions using muslc instead of 
glibc, and that will be available for x86, x86_64, pcc64le, and aarch64 in our 
next release.

Oh... and we are experimenting with s390x at the moment too!
{quote}The Java API seems to be a very low priority to the rocksdb folks.
{quote}
The development of the Java API has fewer resources allocated to it, but is 
still important. One problem is that we receive very little feedback about the 
Java API, so it is hard to know who is using it and the issues they have. For 
example, I only just discovered this discussion about it!

I can also personally apologise. I get bombarded with things from many 
directions, and sometimes things fall through the cracks. If you can keep 
on-top-of your issues and remind me, then I will try and help where I can.
{quote}They leave out critical functionality, features, and configuration 
options that have been in the c++ API for a very long time
{quote}
The Java API tries to mirror the C++ API, and for the most part we have most of 
the functionality that users expect. The C++ API changes very rapidly, with 
things constantly being added, removed, re-added and modified. Some of these 
things are "experimental". In some ways, it is good that the Java API is a 
little behind the C++ API for stability. We try not to miss critical 
functionality though, and when this is raised as an issue we try to address it 
quickly. It is also worth adding that some things don't make sense to port into 
the Java API, this might be due to performance and language constraints. If you 
feel that there are critical things missing please open an issue for each one 
([https://github.com/facebook/rocksdb/issues]) and mention `@adamretter`.
{quote}Those that do make it over often have random gaps in the API such as 
setters but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
{quote}
This was a contribution from an external developer, and it was deemed that it 
improved the current situation even if it was not perfect. Please feel free to 
send PRs yourselves that also improve the situation further.
{quote}Others are poorly designed and require too many trips across the JNI, 
making otherwise incredibly useful features prohibitively expensive.
{quote}
Crossing the JNI boundary is expensive. That is a fact that we cannot ourselves 
change, this is down to those that specify the JVM and JNI fundamentals and 
also the JVM implementers. We can of course try to design our APIs to 
ameliorate these where possible. There is currently ongoing work in this area, 
some of which will be merged before the new year.
{quote}Custom comparator]: a custom comparator could significantly improve the 
performance of session windows. This is trivial to do but given the high 
performance cost of crossing the jni, it is currently only practical to use a 
c++ comparator
{quote}
This is really a fundamental JNI restriction. Things like Comparator and Merge 
Operator are callback based C++ APIs, and crossing that boundary into Java on 
every callback is expensive. We have some improvements to Java Comparators 
which will come in hopefully before the new year to remove the mutex around 
Slice reuse.

For you it would be sensible to implement your performance critical comparators 
in C++ and ship them as a native library. I believe there is some work 
in-progress for RocksDB to gain a mechanism for a module system to allow 
loading C++ comparators and merge operators dynamically. We could potentially 
expose that from the Java API to allow you to load your native comparators.
{quote}[Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
currently used by Streams but a commonly requested feature, and may also allow 
improved range queries
{quote}
We can look at this after - [https://github.com/facebook/rocksdb/pull/2283]
{quote}Even when an external contributor develops a solution for poorly 
performing Java functionality and helpfully tries to contribute their patch 
back to rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])
{quote}
I spoke to Tomas this morning, and we agreed a way forward. He will send some 
follow-up modifications, and it will be merged ASAP. This has taken a lot 
longed than we all would have liked. Initially I deferred making a decision, as 
there was no reproducible performance data provided about the enhancement, and 
producing this myself was going to take time.


was (Author: adamretter):
Sorry to hear about your troubles, I would like to put forward a case for not 
forking RocksDB:

{quote}
 We can avoid passing sudden breaking changes on to our users, such removal of 
methods with no deprecation period (see discussion on KAFKA-8897)
{quote}

>From KAFKA-8897:
{quote}
They never deprecated it. CompactionOptionsFIFO#setTtl() is there in 5.18.3, 
but is removed in 6.0.
{quote}

If we were following SemVer 2, then it is fine/expected to make breaking API 
changes in Major version changes. Regardless, the API of RocksJava closely 
follows the C++ API, if it was not deprecated in the C++ API, then it would not 
have been deprecated in the Java API. 


{quote}
Support for some architectures does not exist in all RocksDB versions, making 
Streams completely unusable for some users until we can upgrade the rocksdb 
dependency to one that supports their specific case
{quote}

As well as x86 and x86_64, we have added pcc64le and aarch64 recently, and 
these are both shipped since 6.4.6 (and ppc64le actually for much longer). We 
have very recently also added support for distributions using muslc instead of 
glibc, and that will be available for x86, x86_64, pcc64le, and aarch64 in our 
next release.

{quote}
The Java API seems to be a very low priority to the rocksdb folks.
{quote}

The development of the Java API has fewer resources allocated to it, but is 
still important. One problem is that we receive very little feedback about the 
Java API, so it is hard to know who is using it and the issues they have. For 
example, I only just discovered this discussion about it!

I can also personally apologise. I get bombarded with things from many 
directions, and sometimes things fall through the cracks. If you can keep 
on-top-of your issues and remind me, then I will try and help where I can.

{quote}
They leave out critical functionality, features, and configuration options that 
have been in the c++ API for a very long time
{quote}

The Java API tries to mirror the C++ API, and for the most part we have most of 
the functionality that users expect. The C++ API changes very rapidly, with 
things constantly being added, removed, re-added and modified. Some of these 
things are "experimental". In some ways, it is good that the Java API is a 
little behind the C++ API for stability. We try not to miss critical 
functionality though, and when this is raised as an issue we try to address it 
quickly. It is also worth adding that some things don't make sense to port into 
the Java API, this might be due to performance and language constraints. If you 
feel that there are critical things missing please open an issue for each one 
(https://github.com/facebook/rocksdb/issues) and mention `@adamretter`.

{quote}
Those that do make it over often have random gaps in the API such as setters 
but no getters (see [rocksdb PR 
#5186|https://github.com/facebook/rocksdb/pull/5186])
{quote}
This was a contribution from an external developer, and it was deemed that it 
improved the current situation even if it was not perfect. Please feel free to 
send PRs yourselves that also improve the situation further.


{quote}
Others are poorly designed and require too many trips across the JNI, making 
otherwise incredibly useful features prohibitively expensive.
{quote}

Crossing the JNI boundary is expensive. That is a fact that we cannot ourselves 
change, this is down to those that specify the JVM and JNI fundamentals and 
also the JVM implementers. We can of course try to design our APIs to 
ameliorate these where possible. There is currently ongoing work in this area, 
some of which will be merged before the new year.

{quote}
Custom comparator]: a custom comparator could significantly improve the 
performance of session windows. This is trivial to do but given the high 
performance cost of crossing the jni, it is currently only practical to use a 
c++ comparator
{quote}

This is really a fundamental JNI restriction. Things like Comparator and Merge 
Operator are callback based C++ APIs, and crossing that boundary into Java on 
every callback is expensive. We have some improvements to Java Comparators 
which will come in hopefully before the new year to remove the mutex around 
Slice reuse.

For you it would be sensible to implement your performance critical comparators 
in C++ and ship them as a native library. I believe there is some work 
in-progress for RocksDB to gain a mechanism for a module system to allow 
loading C++ comparators and merge operators dynamically. We could potentially 
expose that from the Java API to allow you to load your native comparators.

{quote}
[Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not currently 
used by Streams but a commonly requested feature, and may also allow improved 
range queries
{quote}

We can look at this after - https://github.com/facebook/rocksdb/pull/2283

{quote}
Even when an external contributor develops a solution for poorly performing 
Java functionality and helpfully tries to contribute their patch back to 
rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
#2283|https://github.com/facebook/rocksdb/pull/2283])
{quote}

I spoke to Tomas this morning, and we agreed a way forward. He will send some 
follow-up modifications, and it will be merged ASAP. This has taken a lot 
longed than we all would have liked. Initially I deferred making a decision, as 
there was no reproducible performance data provided about the enhancement, and 
producing this myself was going to take time.


> Consider forking RocksDB for Streams 
> -------------------------------------
>
>                 Key: KAFKA-9148
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9148
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Sophie Blee-Goldman
>            Priority: Major
>
> We recently upgraded our RocksDB dependency to 5.18 for its memory-management 
> abilities (namely the WriteBufferManager, see KAFKA-8215). Unfortunately, 
> someone from Flink recently discovered a ~8% [performance 
> regression|https://github.com/facebook/rocksdb/issues/5774] that exists in 
> all versions 5.18+ (up through the current newest version, 6.2.2). Flink was 
> able to react to this by downgrading to 5.17 and [picking the 
> WriteBufferManage|https://github.com/dataArtisans/frocksdb/pull/4]r to their 
> fork (fRocksDB).
> Due to this and other reasons enumerated below, we should consider also 
> forking our own RocksDB for Streams.
> Pros:
>  * We can avoid passing sudden breaking changes on to our users, such removal 
> of methods with no deprecation period (see discussion on KAFKA-8897)
>  * We can pick whichever version has the best performance for our needs, and 
> pick over any new features, metrics, etc that we need to use rather than 
> being forced to upgrade (and breaking user code, introducing regression, etc)
>  * Support for some architectures does not exist in all RocksDB versions, 
> making Streams completely unusable for some users until we can upgrade the 
> rocksdb dependency to one that supports their specific case
>  * The Java API seems to be a very low priority to the rocksdb folks.
>  ** They leave out critical functionality, features, and configuration 
> options that have been in the c++ API for a very long time
>  ** Those that do make it over often have random gaps in the API such as 
> setters but no getters (see [rocksdb PR 
> #5186|https://github.com/facebook/rocksdb/pull/5186])
>  ** Others are poorly designed and require too many trips across the JNI, 
> making otherwise incredibly useful features prohibitively expensive.
>  *** [Custom comparator|#issuecomment-83145980]]: a custom comparator could 
> significantly improve the performance of session windows. This is trivial to 
> do but given the high performance cost of crossing the jni, it is currently 
> only practical to use a c++ comparator
>  *** [Prefix Seek|https://github.com/facebook/rocksdb/issues/6004]: not 
> currently used by Streams but a commonly requested feature, and may also 
> allow improved range queries
>  ** Even when an external contributor develops a solution for poorly 
> performing Java functionality and helpfully tries to contribute their patch 
> back to rocksdb, it gets ignored by the rocksdb people ([rocksdb PR 
> #2283|https://github.com/facebook/rocksdb/pull/2283])
> Cons:
>  * More work (not to be trivialized, the truth is we don't and can't know how 
> much extra work this will ultimately be)
> Given that we rarely upgrade the Rocks dependency, use only some fraction of 
> its features, and would need or want to make only minimal changes ourselves, 
> it seems like we could actually get away with very little extra work by 
> forking rocksdb. Note that as of this writing the frocksdb repo has only 
> needed to open 5 PRs on top of the actual rocksdb (two of them trivial). Of 
> course, the LOE to maintain this will only grow over time, so we should think 
> carefully about whether and when to start taking on this potential burden.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to