Hi Valentin! I've created:
new method strToUtf8BytesDirect in BinaryUtilsNew https://github.com/javaller/MyBenchmark/blob/master/src/ main/java/org/sample/BinaryUtilsNew.java new method doWriteStringDirect in BinaryWriterExImplNew https://github.com/javaller/MyBenchmark/blob/master/src/ main/java/org/sample/BinaryWriterExImplNew.java benchmarks for BinaryWriterExImpl doWriteString and BinaryWriterExImplNew doWriteStringDirect https://github.com/javaller/MyBenchmark/blob/master/src/ main/java/org/sample/ExampleTest.java This is a result of comparing: Benchmark Mode Cnt Score Error UnitsExampleTest.binaryHeapOutputStreamDirect avgt 50 1128448,743 ± 13536,689 ns/opExampleTest.binaryHeapOutputStreamInDirect avgt 50 1127270,695 ± 17309,256 ns/op Vadim 2017-03-02 1:02 GMT+03:00 Valentin Kulichenko <valentin.kuliche...@gmail.com >: > Hi Vadim, > > We're getting closer :) I would actually like to see the test for actual > implementation of BinaryWriterExImpl#doWriteString method. Logic in > binaryHeapOutputInDirect() confuses me a bit and I'm not sure comparison is > valid. > > Can you please do the following: > > 1. Create new BinaryUtils#strToUtf8BytesDirect method, copy-paste the > code from existing BinaryUtils#strToUtf8Bytes and modify it so that it > takes BinaryOutputStream as an argument and writes to it directly. Do not > create stream inside this method, as it's the same as creating new array. > 2. Create new BinaryWriterExImpl#doWriteStringDirect, copy-paste the code > from existing BinaryWriterExImpl#doWriteString and modify it so that it > uses BinaryUtils#strToUtf8BytesDirect and doesn't call out.writeByteArray. > 3. Create benchmark for BinaryWriterExImpl#doWriteString method. I.e., > create an instance of BinaryWriterExImpl and call doWriteString() in > benchmark method. > 4. Similarly, create benchmark for BinaryWriterExImpl#doWriteStringDirect. > 5. Compare results. > > This will give us clear picture of how these two approaches perform. Your > current results are actually promising, but I would like to confirm them. > > -Val > > On Wed, Mar 1, 2017 at 6:17 AM, Вадим Опольский <vaopols...@gmail.com> > wrote: > >> Hi Valentin! >> >> Thank you for comments. >> >> There is a new method which writes directly to BinaryOutputStream instead >> of intermediate array. >> https://github.com/javaller/MyBenchmark/blob/master/src/main >> /java/org/sample/BinaryUtilsNew.java >> >> There is benchmark. >> https://github.com/javaller/MyBenchmark/blob/master/src/main >> /java/org/sample/MyBenchmark.java >> >> Unit test >> https://github.com/javaller/MyBenchmark/blob/master/src/main >> /java/org/sample/BinaryOutputStreamTest.java >> >> Statistics >> https://github.com/javaller/MyBenchmark/blob/master/out_01_03_17.txt >> >> Benchmark >> Mode Cnt Score Error Units MyBenchmark.binaryHeapOutputIn >> Direct avgt 50 111,337 ± 0,742 ns/op >> MyBenchmark.binaryHeapOutputStreamDirect avgt 50 23,847 ± >> 0,303 ns/op >> >> >> Vadim >> >> >> >> >> >> >> >> >> >> >> 2017-02-28 4:29 GMT+03:00 Valentin Kulichenko < >> valentin.kuliche...@gmail.com>: >> >>> Hi Vadim, >>> >>> Looks like you accidentally removed dev list from the thread, adding it >>> back. >>> >>> I think there is still misunderstanding. What I propose is to modify >>> the BinaryUtils#strToUtf8Bytes so that it writes directly to >>> BinaryOutputStream >>> instead of intermediate array. This should decrease memory consumption and >>> can also increase performance as we will avoid 'writeByteArray' step at >>> the end. >>> >>> Does it make sense to you? >>> >>> -Val >>> >>> On Mon, Feb 27, 2017 at 6:55 AM, Вадим Опольский <vaopols...@gmail.com> >>> wrote: >>> >>>> Hi, Valentin! >>>> >>>> What do you think about using the methods of BinaryOutputStream: >>>> >>>> 1) writeByteArray(byte[] val) >>>> 2) writeCharArray(char[] val) >>>> 3) write (byte[] arr, int off, int len) >>>> >>>> String val = "Test"; >>>> out.writeByteArray( val.getBytes(UTF_8)); >>>> >>>> String val = "Test"; >>>> out.writeCharArray(str.toCharArray()); >>>> >>>> String val = "Test" >>>> InputStream stream = new ByteArrayInputStream( >>>> exampleString.getBytes(StandartCharsets.UTF_8)); >>>> byte[] buffer = new byte[1024]; >>>> while ((buffer = stream.read()) != -1) { >>>> out.writeByteArray(buffer); >>>> } >>>> >>>> What else can we use ? >>>> >>>> Vadim >>>> >>>> >>>> 2017-02-25 2:21 GMT+03:00 Valentin Kulichenko < >>>> valentin.kuliche...@gmail.com>: >>>> >>>>> Hi Vadim, >>>>> >>>>> Which method implements the approach described in the ticket? From >>>>> what I see, all writeToStringX versions are still encoding into an >>>>> intermediate array and then call out.writeByteArray. What we need to test >>>>> is the approach where bytes are written directly into the stream during >>>>> encoding. Encoding algorithm itself should stay the same for now, >>>>> otherwise >>>>> we will not know how to interpret the result. >>>>> >>>>> It looks like there is some misunderstanding here, so please let me >>>>> know anything is still unclear. I will be happy to answer your questions. >>>>> >>>>> -Val >>>>> >>>>> On Wed, Feb 22, 2017 at 7:22 PM, Valentin Kulichenko < >>>>> valentin.kuliche...@gmail.com> wrote: >>>>> >>>>>> Hi Vadim, >>>>>> >>>>>> Thanks, I will review this week. >>>>>> >>>>>> -Val >>>>>> >>>>>> On Wed, Feb 22, 2017 at 2:28 AM, Вадим Опольский < >>>>>> vaopols...@gmail.com> wrote: >>>>>> >>>>>>> Hi Valentin! >>>>>>> >>>>>>> https://issues.apache.org/jira/browse/IGNITE-13 >>>>>>> >>>>>>> I created BinaryWriterExImplNew (extended of BinaryWriterExImpl) and >>>>>>> added new methods with changes described in the ticket >>>>>>> >>>>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main >>>>>>> /java/org/sample/BinaryWriterExImplNew.java >>>>>>> >>>>>>> I created a benchmark for BinaryWriterExImplNew >>>>>>> >>>>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main >>>>>>> /java/org/sample/ExampleTest.java >>>>>>> >>>>>>> I run benchmark and compared results >>>>>>> >>>>>>> https://github.com/javaller/MyBenchmark/blob/master/totalstat.txt >>>>>>> >>>>>>> # Run complete. Total time: 00:10:24 >>>>>>> Benchmark Mode Cnt >>>>>>> Score Error Units >>>>>>> ExampleTest.binaryHeapOutputStream1 avgt 50 1114999,207 >>>>>>> ± 16756,776 ns/op >>>>>>> ExampleTest.binaryHeapOutputStream2 avgt 50 1118149,320 >>>>>>> ± 17515,961 ns/op >>>>>>> ExampleTest.binaryHeapOutputStream3 avgt 50 1113678,657 >>>>>>> ± 17652,314 ns/op >>>>>>> ExampleTest.binaryHeapOutputStream4 avgt 50 1112415,051 >>>>>>> ± 18273,874 ns/op >>>>>>> ExampleTest.binaryHeapOutputStream5 avgt 50 1111366,583 >>>>>>> ± 18282,829 ns/op >>>>>>> ExampleTest.binaryHeapOutputStreamACSII avgt 50 1112079,667 ± >>>>>>> 16659,532 ns/op >>>>>>> ExampleTest.binaryHeapOutputStreamUTFCustom avgt 50 1114949,759 >>>>>>> ± 16809,669 ns/op >>>>>>> ExampleTest.binaryHeapOutputStreamUTFNIO avgt 50 >>>>>>> 1121462,325 ± 19836,466 ns/op >>>>>>> >>>>>>> Is it OK? Whats the next step? Do I have to move this JMH benchmark >>>>>>> to the Ignite project ? >>>>>>> >>>>>>> Vadim Opolski >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2017-02-21 1:06 GMT+03:00 Valentin Kulichenko < >>>>>>> valentin.kuliche...@gmail.com>: >>>>>>> >>>>>>>> Hi Vadim, >>>>>>>> >>>>>>>> I'm not sure I understand your benchmarks and how they verify the >>>>>>>> optimization discussed here. Basically, here is what needs to be done: >>>>>>>> >>>>>>>> 1. Create a benchmark for BinaryWriterExImpl#doWriteString method. >>>>>>>> 2. Run the benchmark with current implementation. >>>>>>>> 3. Make the change described in the ticket. >>>>>>>> 4. Run the benchmark with these changes. >>>>>>>> 5. Compare results. >>>>>>>> >>>>>>>> Makes sense? Let me know if anything is unclear. >>>>>>>> >>>>>>>> -Val >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 8:51 AM, Вадим Опольский < >>>>>>>> vaopols...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hello everybody! >>>>>>>>> >>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-13 >>>>>>>>> >>>>>>>>> Valentin, I just have finished benchmark (with JMH) - >>>>>>>>> https://github.com/javaller/MyBenchmark.git >>>>>>>>> >>>>>>>>> It collect data about time working of serialization. >>>>>>>>> >>>>>>>>> For instance - https://github.com/javaller/My >>>>>>>>> Benchmark/blob/master/out200217.txt >>>>>>>>> >>>>>>>>> To start it you have to do next: >>>>>>>>> >>>>>>>>> 1) clone it - git colne https://github.com/javal >>>>>>>>> ler/MyBenchmark.git >>>>>>>>> >>>>>>>>> 2) install it - mvn install >>>>>>>>> >>>>>>>>> 3) run benchmarks - java -Xms1024m -Xmx4096m -jar >>>>>>>>> target\benchmarks.jar >>>>>>>>> >>>>>>>>> Vadim Opolski >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2017-02-15 0:52 GMT+03:00 Valentin Kulichenko < >>>>>>>>> valentin.kuliche...@gmail.com>: >>>>>>>>> >>>>>>>>>> Vladimir, >>>>>>>>>> >>>>>>>>>> I think we misunderstood each other. My understanding of this >>>>>>>>>> optimization is the following. >>>>>>>>>> >>>>>>>>>> Currently string serialization is done in two steps (see >>>>>>>>>> BinaryWriterExImpl#doWriteString): >>>>>>>>>> >>>>>>>>>> strArr = BinaryUtils.strToUtf8Bytes(val); // Encode string into >>>>>>>>>> byte array. >>>>>>>>>> out.writeByteArray(strArr); // Write byte >>>>>>>>>> array into stream. >>>>>>>>>> >>>>>>>>>> What this ticket suggests is to write directly into stream while >>>>>>>>>> string is encoded, without intermediate array. This both reduces >>>>>>>>>> memory >>>>>>>>>> consumption and eliminates array copy step. >>>>>>>>>> >>>>>>>>>> I updated the ticket and added this explanation there. >>>>>>>>>> >>>>>>>>>> Vadim, can you create a micro benchmark and check if it gives any >>>>>>>>>> improvement? >>>>>>>>>> >>>>>>>>>> -Val >>>>>>>>>> >>>>>>>>>> On Sun, Feb 12, 2017 at 10:38 PM, Vladimir Ozerov < >>>>>>>>>> voze...@gridgain.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> It is hard to say whether it makes sense or not. No doubt, it >>>>>>>>>>> could speed up marshalling process at the cost of 2x memory >>>>>>>>>>> required for >>>>>>>>>>> strings. From my previous experience with marshalling >>>>>>>>>>> micro-optimizations, >>>>>>>>>>> we will hardly ever notice speedup in distributed environment. >>>>>>>>>>> >>>>>>>>>>> But, there is another sied - it could speedup our queries, >>>>>>>>>>> because we will not have to unmarshal string on every field access. >>>>>>>>>>> So I >>>>>>>>>>> would try to make this optimization optional and then measure query >>>>>>>>>>> performance with classes having lots of strings. It could give us >>>>>>>>>>> interesting results. >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 13, 2017 at 5:37 AM, Valentin Kulichenko < >>>>>>>>>>> valentin.kuliche...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Vladimir, >>>>>>>>>>>> >>>>>>>>>>>> Can you please take a look and provide your thoughts? Can this >>>>>>>>>>>> be applied to binary marshaller? From what I recall, it serializes >>>>>>>>>>>> string a >>>>>>>>>>>> bit differently from optimized marshaller, so I'm not sure. >>>>>>>>>>>> >>>>>>>>>>>> -Val >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Feb 10, 2017 at 5:16 PM, Dmitriy Setrakyan < >>>>>>>>>>>> dsetrak...@apache.org> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Feb 9, 2017 at 11:26 PM, Valentin Kulichenko < >>>>>>>>>>>>> valentin.kuliche...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> > Hi Vadim, >>>>>>>>>>>>> > >>>>>>>>>>>>> > I don't think it makes much sense to invest into >>>>>>>>>>>>> OptimizedMarshaller. >>>>>>>>>>>>> > However, I would check if this optimization is applicable to >>>>>>>>>>>>> > BinaryMarshaller, and if yes, implement it. >>>>>>>>>>>>> > >>>>>>>>>>>>> >>>>>>>>>>>>> Val, in this case can you please update the ticket? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> > >>>>>>>>>>>>> > -Val >>>>>>>>>>>>> > >>>>>>>>>>>>> > On Thu, Feb 9, 2017 at 11:05 PM, Вадим Опольский < >>>>>>>>>>>>> vaopols...@gmail.com> >>>>>>>>>>>>> > wrote: >>>>>>>>>>>>> > >>>>>>>>>>>>> > > Dear sirs! >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > I want to resolve issue IGNITE-13 - >>>>>>>>>>>>> > > https://issues.apache.org/jira/browse/IGNITE-13 >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > Is it actual? >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > Vadim Opolski >>>>>>>>>>>>> > > >>>>>>>>>>>>> > >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >