[jira] [Created] (ARROW-3481) [Java] Fix Java building failure when use Maven 3.5.4
Yuhong Guo created ARROW-3481: - Summary: [Java] Fix Java building failure when use Maven 3.5.4 Key: ARROW-3481 URL: https://issues.apache.org/jira/browse/ARROW-3481 Project: Apache Arrow Issue Type: Bug Reporter: Yuhong Guo -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3480) [Website] Install document for Ubuntu is broken
Kouhei Sutou created ARROW-3480: --- Summary: [Website] Install document for Ubuntu is broken Key: ARROW-3480 URL: https://issues.apache.org/jira/browse/ARROW-3480 Project: Apache Arrow Issue Type: Bug Components: Website Reporter: Kouhei Sutou Assignee: Kouhei Sutou [https://lists.apache.org/thread.html/11f0aee1ebde1a011816b84dcd3dca4f7bf14dd397b7531451870f29@%3Cuser.arrow.apache.org%3E] {quote} The instructions found here https://arrow.apache.org/install/ don't work. The /etc/apt/sources.list.d/red-data-tools.list file points to the 'main' component. The 'main' component only exists for Debian, for Ubuntu it should be 'universe'. Seen here: https://packages.red-data-tools.org/ubuntu/dists/bionic/universe {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3479) [R] Support to write record_batch as stream
Javier Luraschi created ARROW-3479: -- Summary: [R] Support to write record_batch as stream Key: ARROW-3479 URL: https://issues.apache.org/jira/browse/ARROW-3479 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Javier Luraschi Currently, one can only export a record batch to a file: {code:java} record <- arrow::record_batch(data.frame(a = c(1,2,3))) record$to_file() {code} But to improve performance in Spark's R bindings through sparklyr an improvement is to support streams returning R raw's as follows: {code:java} record <- arrow::record_batch(data.frame(a = c(1,2,3))) record$to_stream(){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3478) [C++] API add value / null mask buffer accessor to ArrayData
Wolf Vollprecht created ARROW-3478: -- Summary: [C++] API add value / null mask buffer accessor to ArrayData Key: ARROW-3478 URL: https://issues.apache.org/jira/browse/ARROW-3478 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wolf Vollprecht Currently, the ArrayData struct has the `std::vector buffers` member. The buffers will (from reading the code) either be [null_mask, data] or [data] (if the null mask does not exist). I'm not sure if there is an easy way to get the null mask reliably at the moment. If I am understanding correctly, the way to do it right now is to check if the vector has one or two elements, and then use `buffers[0]` as the null mask, and `buffers[1]` as the values. I also did not find information regarding this in the spec. So I am not sure if I can rely on this behavior in future versions of the library. I am wondering wether adding explicit API for this would make this more reliable. For example two more interface functions * `std::shared_ptr mask()` * `std::shared_ptr values()` Would make it easy for me to rely on the interface to "do the right thing". Or am I missing something? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3477) Testsuite fails on 32 bit arch
Dmitry Kalinkin created ARROW-3477: -- Summary: Testsuite fails on 32 bit arch Key: ARROW-3477 URL: https://issues.apache.org/jira/browse/ARROW-3477 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.11.0 Reporter: Dmitry Kalinkin Attachments: arrow_0.10.0_i686_test_fail.log, arrow_0.11.0_i686_test_fail.log While investigating PARQUET-1438 we have discovered that there is a regression in arrow-cpp testsuite results between versions 0.10.0 and 0.11.0 when running in 32 bit executable. There used to be just a single test failing: * array-test and starting 0.11.0 it's four tests: * array-test * buffer-test * bit-util-test * rle-encoding-test (list not including parquet-* tests) I did bisect and found that the three tests were broken in 479c011a6ac7a8f1e6d77ecf651a4b2be9e5eec0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3476) [Java] mvn test in memory fails on a big-endian platform
Kazuaki Ishizaki created ARROW-3476: --- Summary: [Java] mvn test in memory fails on a big-endian platform Key: ARROW-3476 URL: https://issues.apache.org/jira/browse/ARROW-3476 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Kazuaki Ishizaki On a big-endian platform, {{mvn test}} in memory causes a failure due to an assertion. In {{TestEndianess.testLittleEndian}} test suite, the assertion occurs during an allocation of a {{RootAllocator}} class. {code} $ uname -a Linux ppc64be.novalocal 4.5.7-300.fc24.ppc64 #1 SMP Fri Jun 10 20:29:32 UTC 2016 ppc64 ppc64 ppc64 GNU/Linux $ arch ppc64 $ cd java/memory $ mvn test [INFO] Scanning for projects... [INFO] [INFO] [INFO] Building Arrow Memory 0.12.0-SNAPSHOT [INFO] [INFO] ... [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.082 s - in org.apache.arrow.memory.TestAccountant [INFO] Running org.apache.arrow.memory.TestLowCostIdentityHashMap [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 s - in org.apache.arrow.memory.TestLowCostIdentityHashMap [INFO] Running org.apache.arrow.memory.TestBaseAllocator [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.746 s <<< FAILURE! - in org.apache.arrow.memory.TestEndianess [ERROR] testLittleEndian(org.apache.arrow.memory.TestEndianess) Time elapsed: 0.313 s <<< ERROR! java.lang.ExceptionInInitializerError at org.apache.arrow.memory.TestEndianess.testLittleEndian(TestEndianess.java:31) Caused by: java.lang.IllegalStateException: Arrow only runs on LittleEndian systems. at org.apache.arrow.memory.TestEndianess.testLittleEndian(TestEndianess.java:31) [ERROR] Tests run: 22, Failures: 0, Errors: 21, Skipped: 1, Time elapsed: 0.055 s <<< FAILURE! - in org.apache.arrow.memory.TestBaseAllocator ... {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Nightly tests for Arrow
On Tue, Oct 9, 2018 at 6:02 PM Antoine Pitrou wrote: > > Le 09/10/2018 à 17:54, Wes McKinney a écrit : > > hi folks, > > > > After the packaging automation work for 0.10 was completed, we have > > stalled out a bit on one of the objectives of this framework, which is > > to allow contributors to define and add new tasks that can be run on > > demand or as part of a nightly job. > > > > So we have some problems to solve: > > > > * How to define a task we wish to validate (like building the API > > documentation, or building Arrow with some particular build > > parameters) as a new Crossbow task -- document this well so that > > people have some instructions to follow > Crossbow indeed lacks of documentation in that matter. Defining a task requires a CI configuration and commands per platform and a section in tasks.yml. However I think this is not straightforward enough - like just creating a bash/batch script - We still need to define config management stuff (which makes user friendliness harder to achieve). > > * How to add a task to some kind of a nightly build manifest > > * Where to schedule and run the nightly jobs > Currently nightly builds are submitted by this nightly travis script: https://github.com/kszucs/crossbow/blob/trigger-nightly-builds/.travis.yml We can have arbitrary number of branches to trigger custom jobs, however it requires manual travis setup - with still not satisfying ergonomics. > > * Reporting nightly build failures to the mailing list > I regularly check the nightly builds which occasionally fails, mostly transient failures. For example last conda nightlies have failed, because conda-build have some issues with libarchive - during the feedstock updates I couldn't even rerender them locally. BTW to send the errors to the mailing list We need to set CROSSBOW_EMAIL env variable https://github.com/apache/arrow/blob/master/dev/tasks/crossbow.py#L475 (We might want to use a centralized crossbow repository though with proper permissions). > > > > In terms of scalability requirements, this needs to accommodate 50-100 > tasks. > The current tasks.yml contains a lot of duplication which bothers me, but it provides more flexibility than having another "matrix" definition and implementation. I don't have a user friendly solution for that yet. Parallelization is another question, a single crossbow repo can run ~5 travis jobs and a single appveyor job simultaneously, however We can improve that via introducing more CI services, e.g. pipelines and/or circleci. CI service agnostic? Ideally We should abstract away the CI service (the worker itself), where We do the configuration management right now, see the "..yml" files: https://github.com/apache/arrow/tree/master/dev/tasks/conda-recipes But then We need to create another, custom (I hope not yml) "dialect" to define build requirements (e.g. node, python, ruby, clang, etc.). It's quite hard to plan an easy and flexible interface for that. > > > > This won't be the last time we need to do some infrastructure work to > > scale our testing process, but this will help with testing things that > > we want to make sure work but without having to increase the size of > > our CI matrix. > > One question which came to my mind is how to develop, debug and maintain > the nightly tasks without waiting for the nightly Travis run for > validation. It doesn't seem easy to trigger a "nightly" build from the > Travis UI. > Good point! Triggering is not the actual issue, but the evaluation of the outcome. We can submit builds if the PR touches e.g. the task definitions, but We cannot really wait for the results, thus triggering builds could be useless. Actually this can be solved by a github integration bot Wes has mentioned, with manual triggering and approval. > > Regards > > Antoine. > All in all I feel the usability crucial here. A couple of examples how a straightforward task definition should look like would be handy. Handling and defining task dependencies is another question too (I'm experimenting with a prototype though). Regards, Krisztian
RE: Apache Arrow .NET implementation
I planned on contributing long term in an open-source capacity separate from my employer, but I imagine spending professional time as our product evolves. I don't have an Apache ID at the moment; I also forgot to include that in my ICLA, although I specified in my submission to the secretary the ID(s) I prefer. -Original Message- From: Wes McKinney [mailto:wesmck...@gmail.com] Sent: Tuesday, October 9, 2018 12:04 PM To: dev@arrow.apache.org Subject: Re: Apache Arrow .NET implementation hi Christopher, This is great to hear. Do you and your colleagues have plans to continue developing it if the donation is accepted into Apache Arrow? Thanks, Wes On Tue, Oct 9, 2018 at 12:01 PM Christopher S. Hutchinson wrote: > > I am writing to announce that my employer (Feyen Zylstra LLC) has developed > an implementation of Apache Arrow targeting .NET Standard. We are donating > this implementation to the Apache Software Foundation. The source code is > available for review on GitHub: > > https://github.com/feyenzylstra/apache-arrow > > Please let me know if you have any questions. Feel free to leave issues on > GitHub, and happy voting! >
Re: Nightly tests for Arrow
On Tue, Oct 9, 2018 at 12:02 PM Antoine Pitrou wrote: > > One question which came to my mind is how to develop, debug and maintain > the nightly tasks without waiting for the nightly Travis run for > validation. It doesn't seem easy to trigger a "nightly" build from the > Travis UI. I think we should develop a bot that we can ask to run tasks having a particular name or matching a wildcard. e.g. something like @arrow-test-bot validate conda > > Regards > > Antoine.
Re: Apache Arrow .NET implementation
hi Christopher, This is great to hear. Do you and your colleagues have plans to continue developing it if the donation is accepted into Apache Arrow? Thanks, Wes On Tue, Oct 9, 2018 at 12:01 PM Christopher S. Hutchinson wrote: > > I am writing to announce that my employer (Feyen Zylstra LLC) has developed > an implementation of Apache Arrow targeting .NET Standard. We are donating > this implementation to the Apache Software Foundation. The source code is > available for review on GitHub: > > https://github.com/feyenzylstra/apache-arrow > > Please let me know if you have any questions. Feel free to leave issues on > GitHub, and happy voting! >
Apache Arrow .NET implementation
I am writing to announce that my employer (Feyen Zylstra LLC) has developed an implementation of Apache Arrow targeting .NET Standard. We are donating this implementation to the Apache Software Foundation. The source code is available for review on GitHub: https://github.com/feyenzylstra/apache-arrow Please let me know if you have any questions. Feel free to leave issues on GitHub, and happy voting!
Re: Nightly tests for Arrow
Le 09/10/2018 à 17:54, Wes McKinney a écrit : > hi folks, > > After the packaging automation work for 0.10 was completed, we have > stalled out a bit on one of the objectives of this framework, which is > to allow contributors to define and add new tasks that can be run on > demand or as part of a nightly job. > > So we have some problems to solve: > > * How to define a task we wish to validate (like building the API > documentation, or building Arrow with some particular build > parameters) as a new Crossbow task -- document this well so that > people have some instructions to follow > * How to add a task to some kind of a nightly build manifest > * Where to schedule and run the nightly jobs > * Reporting nightly build failures to the mailing list > > In terms of scalability requirements, this needs to accommodate 50-100 tasks. > > This won't be the last time we need to do some infrastructure work to > scale our testing process, but this will help with testing things that > we want to make sure work but without having to increase the size of > our CI matrix. One question which came to my mind is how to develop, debug and maintain the nightly tasks without waiting for the nightly Travis run for validation. It doesn't seem easy to trigger a "nightly" build from the Travis UI. Regards Antoine.
Nightly tests for Arrow
hi folks, After the packaging automation work for 0.10 was completed, we have stalled out a bit on one of the objectives of this framework, which is to allow contributors to define and add new tasks that can be run on demand or as part of a nightly job. So we have some problems to solve: * How to define a task we wish to validate (like building the API documentation, or building Arrow with some particular build parameters) as a new Crossbow task -- document this well so that people have some instructions to follow * How to add a task to some kind of a nightly build manifest * Where to schedule and run the nightly jobs * Reporting nightly build failures to the mailing list In terms of scalability requirements, this needs to accommodate 50-100 tasks. This won't be the last time we need to do some infrastructure work to scale our testing process, but this will help with testing things that we want to make sure work but without having to increase the size of our CI matrix. Thoughts about how to proceed? Thanks Wes
[jira] [Created] (ARROW-3475) C++ Int64Builder.Finish(NumericArray)
Wolf Vollprecht created ARROW-3475: -- Summary: C++ Int64Builder.Finish(NumericArray) Key: ARROW-3475 URL: https://issues.apache.org/jira/browse/ARROW-3475 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wolf Vollprecht I was intuitively thinking that the following code would work: {{Status s;}} {{Int64Builder builder;}} {{s = builder.Append(1);}} {{s = builder.Append(2);}} {{std::shared_ptr> array;}} {{builder.Finish(&array);}} However, it does not seem to work, as the finish operation is not overloaded in the Int64 (or the numeric builder). Would it make sense to add this interface? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3474) [Glib] Extend gparquet API with get_schema and read_column
Benoit Rostykus created ARROW-3474: -- Summary: [Glib] Extend gparquet API with get_schema and read_column Key: ARROW-3474 URL: https://issues.apache.org/jira/browse/ARROW-3474 Project: Apache Arrow Issue Type: Improvement Components: GLib Affects Versions: 0.11.0 Reporter: Benoit Rostykus So we can read individual columns without loading the whole parquet file in memory, we need to surface the getSchema and ReadColumn functions of parquet-cpp to the parquet glib API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [RESULT][VOTE] Release Apache Arrow 0.11.0 (RC1)
Hi, > I clicked "Release" button at > https://repository.apache.org/#stagingRepositories but > https://search.maven.org/search?q=g:org.apache.arrow%20AND%20v:0.11.0 > shows nothing. There are 0.11.0 packages. I should have waited... In <20181009.001150.1130060421909409464@clear-code.com> "Re: [RESULT][VOTE] Release Apache Arrow 0.11.0 (RC1)" on Tue, 09 Oct 2018 00:11:50 +0900 (JST), Kouhei Sutou wrote: > Hi, > > One problem for Java packages: > > I clicked "Release" button at > https://repository.apache.org/#stagingRepositories but > https://search.maven.org/search?q=g:org.apache.arrow%20AND%20v:0.11.0 > shows nothing. > > Can you help this? > > > Here are remains tasks: > > * Updating the Arrow website > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingtheArrowwebsite > > The remained sub task is only blog post. Wes is working > on this: > https://github.com/apache/arrow/pull/2724 > > Can contributors for 0.11.0 confirm this? > > (I'll confirm this tomorrow.) > > * Updating website with new API documentation > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingwebsitewithnewAPIdocumentation > > Krisztián is working on this: > https://github.com/apache/arrow/pull/2723 > > * Updating conda packages > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-Updatingcondapackages > > Krisztián is working on this. > > * Announcing release > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-Announcingrelease > > I'll do this after the above blog post is published. > > Other tasks in "Post-release tasks" > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-Post-releasetasks > have done. > > > Thanks, > -- > kou > > In <20181008.194105.48515664531298@clear-code.com> > "[RESULT][VOTE] Release Apache Arrow 0.11.0 (RC1)" on Mon, 08 Oct 2018 > 19:41:05 +0900 (JST), > Kouhei Sutou wrote: > >> With 3 binding +1 votes, 1 non-binding +1 and no other >> votes, the vote passes. Thanks all! >> >> I'll start "Post-release tasks". >> >> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-Post-releasetasks >> >> Wes will write a blog post. >> >> Krisztián will create the conda-forge PRs. >> >> Any other helps are also welcome! >> >> >> Thanks, >> -- >> kou
[ANNOUNCE] Apache Arrow 0.11.0 released
The Apache Arrow community is pleased to announce the 0.11.0 release. It includes 288 resolved issues ([1]) since the 0.10.0 release. The release is available now from our website and [2]: https://arrow.apache.org/install/ Read about what's new in the release https://arrow.apache.org/blog/2018/10/09/0.11.0-release/ Changelog https://arrow.apache.org/release/0.11.0.html What is Apache Arrow? - Apache Arrow is a columnar in-memory analytics layer designed to accelerate big data. It houses a set of canonical in-memory representations of flat and hierarchical data along with multiple language-bindings for structure manipulation. It also provides low-overhead streaming and batch messaging, zero-copy interprocess communication (IPC), and vectorized in-memory analytics libraries. Please report any feedback to the mailing lists ([3]) Regards, The Apache Arrow community [1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20fixVersion%20%3D%200.11.0%20ORDER%20BY%20priority%20DESC [2]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.11.0/ [3]: https://lists.apache.org/list.html?dev@arrow.apache.org
[jira] [Created] (ARROW-3473) [Format] Update Layout.md document to clarify use of 64-bit array lengths
Wes McKinney created ARROW-3473: --- Summary: [Format] Update Layout.md document to clarify use of 64-bit array lengths Key: ARROW-3473 URL: https://issues.apache.org/jira/browse/ARROW-3473 Project: Apache Arrow Issue Type: Improvement Components: Format Reporter: Wes McKinney Fix For: 0.12.0 See https://github.com/apache/arrow/issues/2733. While 64-bit lengths are permitted, it is recommended to limit array sizes to 32-bit length or less. I will update -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [JAVA] Arrow performance measurement
Hi Wes and all, Here is another round of updates: Quick recap - previously we established that for 1kB binary blobs, Arrow can deliver > 160 Gbps performance from in-memory buffers. In this round I looked at the performance of materializing "integers". In my benchmarks, I found that with careful optimizations/code-rewriting we can push the performance of integer reading from 5.42 Gbps/core to 13.61 Gbps/core (~2.5x). The peak performance with 16 cores, scale up to 110+ Gbps. Key things to do is: 1) Disable memory access checks in Arrow and Netty buffers. This gave significant performance boost. However, for such an important performance flag, it is very poorly documented ("drill.enable_unsafe_memory_access=true"). 2) Materialize values from Validity and Value direct buffers instead of calling getInt() function on the IntVector. This is implemented as a new Unsafe reader type ( https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31 ) 3) Optimize bitmap operation to check if a bit is set or not ( https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23 ) A detailed write up of these steps is available here: https://github.com/animeshtrivedi/blog/blob/master/post/2018-10-09-arrow-int.md I have 2 follow-up questions: 1) Regarding the `isSet` function, why does it has to calculate number of bits set? ( https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L797). Wouldn't just checking if the result of the AND operation is zero or not be sufficient? Like what I did : https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L28 2) What is the reason behind this bitmap generation optimization here https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BitVectorHelper.java#L179 ? At this point when this function is called, the bitmap vector is already read from the storage, and contains the right values (either all null, all set, or whatever). Generating this mask here for the special cases when the values are all NULL or all set (this was the case in my benchmark), can be slower than just returning what one has read from the storage. Collectively optimizing these two bitmap operations give more than 1 Gbps gains in my bench-marking code. Cheers, -- Animesh On Thu, Oct 4, 2018 at 12:52 PM Wes McKinney wrote: > See e.g. > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/ipc-read-write-test.cc#L222 > > > On Thu, Oct 4, 2018 at 6:48 AM Animesh Trivedi > wrote: > > > > Primarily write the same microbenchmark as I have in Java in C++ for > table > > reading and value materialization. So just an example of equivalent > > ArrowFileReader example code in C++. Unit tests are a good starting > point, > > thanks for the tip :) > > > > On Thu, Oct 4, 2018 at 12:39 PM Wes McKinney > wrote: > > > > > > 3. Are there examples of Arrow in C++ read/write code that I can > have a > > > look? > > > > > > What kind of code are you looking for? I would direct you to relevant > > > unit tests that exhibit certain functionality, but it depends on what > > > you are trying to do > > > On Wed, Oct 3, 2018 at 9:45 AM Animesh Trivedi > > > wrote: > > > > > > > > Hi all - quick update on the performance investigation: > > > > > > > > - I spent some time looking at performance profile for a binary blob > > > column > > > > (1024 bytes of byte[]) and found a few favorable settings for > delivering > > > up > > > > to 168 Gbps from in-memory reading benchmark on 16 cores. These > settings > > > > (NUMA, JVM settings, Arrow holder API, and batch size, etc.) are > > > documented > > > > here: > > > > > > > > https://github.com/animeshtrivedi/blog/blob/master/post/2018-10-03-arrow-binary.md > > > > - these setting also help to improved the last number that reported > (but > > > > not by much) for the in-memory TPC-DS store_sales table from ~39 > Gbps up > > > to > > > > ~45-47 Gbps (note: this number is just in-memory benchmark, i.e., > w/o any > > > > networking or storage links) > > > > > > > > A few follow up questions that I have: > > > > 1. Arrow reads a batch size worth of data in one go. Are there any > > > > recommended batch sizes? In my investigation, small batch size help > with > > > a > > > > better cache profile but increase number of instructions required > (more > > > > looping). Larger one do otherwise. Somehow ~10MB/thread seem to be > the > > > best > > > > performing configuration, which is also a bit counter intuitive as > for 16 > > > > threads this will lead to 160 MB of memory footprint. May be this is > also > > > > tired to the memory management logic which is my next question. > > > > 2. Arrow use's netty's memory manager. (i) what are
[jira] [Created] (ARROW-3472) remove gandiva helpers library
Pindikura Ravindra created ARROW-3472: - Summary: remove gandiva helpers library Key: ARROW-3472 URL: https://issues.apache.org/jira/browse/ARROW-3472 Project: Apache Arrow Issue Type: Task Components: Gandiva Reporter: Pindikura Ravindra Assignee: Pindikura Ravindra Gandiva has two native libraries - libgandiva.so and libgandiva_helpers.so - the helpers one is mostly a duplicate and was added to get around unresolved symbols with java/jni. but, this is a hack and needs to be cleaned up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3471) [Gandiva] Investigate caching isomorphic expressions
Praveen Kumar Desabandu created ARROW-3471: -- Summary: [Gandiva] Investigate caching isomorphic expressions Key: ARROW-3471 URL: https://issues.apache.org/jira/browse/ARROW-3471 Project: Apache Arrow Issue Type: Task Reporter: Praveen Kumar Desabandu Fix For: 0.12.0 Two expressions say add(a+b) and add(c+d), could potentially be reused if the only thing differing are the names. Test E2E. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3470) [C++] Row-wise conversion tutorial has fallen out of date
Wes McKinney created ARROW-3470: --- Summary: [C++] Row-wise conversion tutorial has fallen out of date Key: ARROW-3470 URL: https://issues.apache.org/jira/browse/ARROW-3470 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney Fix For: 0.12.0 As reported on user@ list -- This message was sent by Atlassian JIRA (v7.6.3#76005)