Re: Arrow Flight + Go, Arrow for Realtime

2020-08-12 Thread Sebastien Binet
Mark,

AFAIK, nobody's actively working on Arrow-Flight for Go (I think somebody 
started
that work at some point but I don't remember anything hitting the main repo)

as for Go+WASM:

https://lists.apache.org/thread.html/e15dc80debf9dea1b33581fa6ba95fd84b57c0ccd0162505d5d25079%40%3Cdev.arrow.apache.org%3E

ie:
===
I've just tried compiling this example:
-  https://godoc.org/github.com/apache/arrow/go/arrow#example-package--Table
to wasm.
compilation went fine:

$> GOOS=js GOARCH=wasm go build -o foo.wasm foo.go
$> go-wasm ./foo.wasm
rec[0]["f1-i32"]: [1 2 3 4 5]
rec[0]["f2-f64"]: [1 2 3 4 5]
rec[1]["f1-i32"]: [6 7 8 (null) 10]
rec[1]["f2-f64"]: [6 7 8 9 10]
rec[2]["f1-i32"]: [11 12 13 14 15]
rec[2]["f2-f64"]: [11 12 13 14 15]
rec[3]["f1-i32"]: [16 17 18 19 20]
rec[3]["f2-f64"]: [16 17 18 19 20]

and it ran fine once this patch was added:
- https://github.com/apache/arrow/pull/3707

hth,
-s

PS: go-wasm is an alias of mine for this file:
https://github.com/golang/go/blob/master/misc/wasm/go_js_wasm_exec
===


hth,
-s



‐‐‐ Original Message ‐‐‐
On Wednesday, August 12, 2020 11:29 AM,  wrote:

> I'm looking at using Arrow for a realtime IoT project which includes use
> cases both on server, and also for transferring /using in a Browser via
> WASM, and have a few questions.
>
> Language in use is Go.
>
> Is anyone working on implementing Arrow-Flight in Go ? (According to
> the feature matrix, nothing ready yet, so wanted to check.
>
> Has anyone tried using Apache Arrow in Go WASM (Webassembly) ? if so,
> any issues ?
>
> Any pointers/documentation on using/extending Arrow for realtime streaming
> cases. (Specifically where a DataFrame is requested, but then it needs to
> 'grow' as new data arrives, often at high speed).
>
> Not language specific, just trying to understand the right pattern for using
> Arrow for this, and couldn't' find much in the docs.
>
> Regards
>
> Mark.




Re: [ANNOUNCE] New Arrow PMC member: Sebastien Binet

2019-08-16 Thread Sebastien Binet
Thanks everyone!

I'll try to continue and do my best :)

cheers,
-s

On Tue, Aug 13, 2019 at 10:55 PM Wes McKinney  wrote:

> The Project Management Committee (PMC) for Apache Arrow has invited
> Sebastien Binet to become a PMC member and we are pleased to announce
> that Sebastien has accepted.
>
> Congratulations and welcome!
>


Re: [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-08-20 Thread Sebastien Binet
here is my vote: +1


On Tue, Aug 20, 2019 at 3:33 PM Wes McKinney  wrote:

> We need some more PMC members to look at this vote. I know the issue
> is esoteric but please let us know if you have questions.
>
> On Thu, Aug 15, 2019 at 3:35 AM Micah Kornfield 
> wrote:
> >
> > >
> > >
> > > Actually, it looks like my Mac's version of UBSAN doesn't detect the
> issue
> > > at all.  I will try on linux a by EOD.
> >
> > Actually, the issue was I had alignment checks off.  I verify this works
> > and it appears there is a UBSan issue with flatbuffers::Verify (I'll try
> to
> > see if I can find the issue and make a PR upstream).
> >
> > On Thu, Aug 15, 2019 at 1:03 AM Micah Kornfield 
> > wrote:
> >
> > > I verified with these changes [1], without backwards compatibility
> > >> support, UBSAN runs cleanly for IPC tests in C++
> > >
> > > Actually, it looks like my Mac's version of UBSAN doesn't detect the
> issue
> > > at all.  I will try on linux a by EOD.
> > >
> > > On Thu, Aug 15, 2019 at 12:52 AM Micah Kornfield <
> emkornfi...@gmail.com>
> > > wrote:
> > >
> > >> +1
> > >>
> > >> I verified with these changes [1], without backwards compatibility
> > >> support, UBSAN runs cleanly for IPC tests in C++
> > >>
> > >> Just wanted to clarify:
> > >>
> > >>> Additionally with this vote, we want to formally approve the change
> to
> > >>> the Arrow "file" format to always write the (new 8-byte)
> end-of-stream
> > >>> marker, which enables code that processes Arrow streams to safely
> read
> > >>> the file's internal messages as though they were a normal stream.
> > >>
> > >> This only allows for reading messages safely, we still aren't
> > >> guaranteeing dictionary batches occur in the file before they are
> used,
> > >> correct?
> > >>
> > >> Thanks,
> > >> Micah
> > >>
> > >> [1]
> > >>
> https://github.com/emkornfield/arrow/commit/8b8348d8bcf62b50c35ddb4926f3d501b4f7147c
> > >>
> > >>
> > >> On Wed, Aug 14, 2019 at 3:43 PM Wes McKinney 
> wrote:
> > >>
> > >>> hi all,
> > >>>
> > >>> As we've been discussing [1], there is a need to introduce 4 bytes of
> > >>> padding into the preamble of the "encapsulated IPC message" format to
> > >>> ensure that the Flatbuffers metadata payload begins on an 8-byte
> > >>> aligned memory offset. The alternative to this would be for Arrow
> > >>> implementations where alignment is important (e.g. C or C++) to copy
> > >>> the metadata (which is not always small) into memory when it is
> > >>> unaligned.
> > >>>
> > >>> Micah has proposed to address this by adding a
> > >>> 4-byte "continuation" value at the beginning of the payload
> > >>> having the value 0x. The reason to do it this way is that
> > >>> old clients will see an invalid length (what is currently the
> > >>> first 4 bytes of the message -- a 32-bit little endian signed
> > >>> integer indicating the metadata length) rather than potentially
> > >>> crashing on a valid length. We also propose to expand the "end of
> > >>> stream" marker used in the stream and file format from 4 to 8
> > >>> bytes. This has the additional effect of aligning the file footer
> > >>> defined in File.fbs.
> > >>>
> > >>> This would be a backwards incompatible protocol change, so older
> Arrow
> > >>> libraries would not be able to read these new messages. Maintaining
> > >>> forward compatibility (reading data produced by older libraries)
> would
> > >>> be possible as we can reason that a value other than the continuation
> > >>> value was produced by an older library (and then validate the
> > >>> Flatbuffer message of course). Arrow implementations could offer a
> > >>> backward compatibility mode for the sake of old readers if they
> desire
> > >>> (this may also assist with testing).
> > >>>
> > >>> Additionally with this vote, we want to formally approve the change
> to
> > >>> the Arrow "file" format to always write the (new 8-byte)
> end-of-stream
> > >>> marker, which enables code that processes Arrow streams to safely
> read
> > >>> the file's internal messages as though they were a normal stream.
> > >>>
> > >>> The PR making these changes to the IPC documentation is here
> > >>>
> > >>> https://github.com/apache/arrow/pull/4951
> > >>>
> > >>> Please vote to accept these changes. This vote will be open for at
> > >>> least 72 hours
> > >>>
> > >>> [ ] +1 Adopt these Arrow protocol changes
> > >>> [ ] +0
> > >>> [ ] -1 I disagree because...
> > >>>
> > >>> Here is my vote: +1
> > >>>
> > >>> Thanks,
> > >>> Wes
> > >>>
> > >>> [1]:
> > >>>
> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> > >>>
> > >>
>


Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-09-12 Thread Sebastien Binet
hi there,

On Thu, Sep 12, 2019 at 12:45 AM Wes McKinney  wrote:

> Thanks Bryan.
>
> I merged the Java patch with the EOS change and submitted a C++ patch
> which also updates the specification
>
> https://github.com/apache/arrow/pull/5361
>
> Let me know when the JS or C# patches are ready to go and I can merge
> those.
>
> I updated https://issues.apache.org/jira/browse/ARROW-6545 to track
> the Go change corresponding to this
>

done. (I think)

-s


Re: Apache Arrow build with needed dependencies only

2019-11-15 Thread Sebastien Binet
hi Richard,

On Thu, Nov 7, 2019 at 5:00 PM Richard Bachmann 
wrote:

> Hello,
> I'm contacting you on behalf of the LCG Releases team at CERN. We
> provide a common software stack for LHCb, ATLAS and others to be used at
> CERN and the worldwide computing grid.
>

you may want to reach for the ALICE community.
they are routinely building Arrow as part as their sw-stack.
here is the recipe:
- https://github.com/alisw/alidist/blob/master/arrow.sh

-s

PS: IIRC, lxplus has Go installed (a rather old version, though: 1.8), so
one could install Go-Arrow in one go:
$> go get github.com/apache/arrow/go/arrow/...


Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-25 Thread Sebastien Binet
On Wed, Mar 25, 2020 at 2:32 AM Wes McKinney  wrote:

> From what I've found searching on the internet
>
> - Java:
> * ZSTD -- JNI-based library available
> * LZ4 -- both JNI and native Java available
>
> - Go: ZSTD is a C binding, while there is an LZ4 native Go implementation
>
AFAIK, one has access to pure-Go packages for both of these compressors:
- github.com/pierrec/lz4
- github.com/klauspost/compress

-s


Re: Issue with GitHub PR

2018-10-22 Thread Sebastien Binet
hi,

On Mon, Oct 22, 2018 at 4:54 PM paddy horan  wrote:

> Hey all,
>
> I created a PR for ARROW-3541, after addressing review comments i rebased
> and force pushed to my branch. GitHub seems to be having issues though, the
> PR is not updating and i don’t believe CI was re-triggered. Looking at the
> PR now comments I made this morning are not showing up and comments I
> deleted because GitHub posted them multiple times are back.
>
> I know we have tooling that relies on the PR name, for instance in JIRA
> the pull-request-available tag has been added to the issue.  Can I rename
> and abandon the PR so I can open a new PR with the correct name to try and
> get the CI to trigger or will this mess up our tooling?
>

github is having issues since Sunday:
-  https://status.github.com/messages

they are in the middle of recovering from them but in the meantime, they've
disabled webhooks.
I doubt resending a PR will work.

-s


Re: Sync today right now

2018-11-01 Thread Sebastien Binet
Hi Brendan,

On Thu, Nov 1, 2018 at 1:43 PM Brendan O'Brien  wrote:

> Being pointed at the IPC protocol issue, and breaking it down is the exact
> conversation we're hoping to have.
>
> We're a few Go devs in Brooklyn, all *very* excited about Arrow, and would
> need IPC for our use case. We'd be happy to roll up our sleeves on this one
> in the off-hours.
>
> It seems Sebastien Binet & Stuart Carnie are doing great work on the Go
> implementation. Ideally we'd stay out of their way, and grab a a small Go
> sub-task to use as a working introduction and make sure our code practices
> align. If that task is on the roadmap for IPC support, all the better. In
> terms of dev bandwidth, we're limited now, but should be able to put fourth
> more sizable contributions in December. It'd be great if we're known
> entities by then.
>

I'll also need Arrow-IPC support for one of my other CERN-based
applications.
but this application isn't yet at the top of my TODO list (I'll probably
circle back to it beginning of December.)
so I'd be happy to review PRs related to adding IPC support in the meantime
:)

I haven't looked yet at what adding IPC support entails in details, thus
the rather generic and vague JIRA ticket.
breaking it into smaller items sounds like useful work indeed.

as for small-ish introductory sub-tasks, I can think of e.g. the
implementation of Time{32,64} and Date{32,64} arrays:
- https://issues.apache.org/jira/browse/ARROW-3672
- https://issues.apache.org/jira/browse/ARROW-3673
- https://issues.apache.org/jira/browse/ARROW-3674
- https://issues.apache.org/jira/browse/ARROW-3675

happy to see more gophers showing up :)

-s


> Apache arrow is an incredible project. Thanks to all for your
> contributions.
>
> On Thu, Nov 1, 2018 at 8:10 AM Wes McKinney  wrote:
>
> > On the Go point, it sounds like this means implementing the IPC protocol:
> >
> > https://issues.apache.org/jira/browse/ARROW-3679
> >
> > There's many layers to this, from metadata serialization to record
> > batch reconstruction, so may make sense to create some sub-tasks to
> > make the problem a bit less monolithic.
> >
> > Would also be great to get Go to participate in the integration test
> suite
> > On Wed, Oct 31, 2018 at 12:17 PM Jacques Nadeau 
> > wrote:
> > >
> > > Recap: Short one today.
> > >
> > > # Attendees
> > > Jacques
> > > Pearu
> > > Brendan
> > > Li
> > >
> > > # Topics
> > > ## Go flatbuffers support:
> > > Brendan and company are interested in contributing this. They want to
> > > discuss approach with existing Go developers. Recommendation was to
> > start a
> > > thread on mailing list and then create follow-up jiras as tasks are
> > > identified.
> > >
> > > ## Better Dictionary Support in Java
> > > Li has some ideas on this but they are complex. Plans to write up a
> > > proposal for the mailing list.
> > >
> > > On Wed, Oct 31, 2018 at 9:02 AM Jacques Nadeau 
> > wrote:
> > >
> > > > https://meet.google.com/vtm-teks-phx
> > > >
> >
>
>
> --
> --
> Brendan O'Brien
> caretaker, qri.io
> twitter.com/b_fiive
> meeting avail: https://calendly.com/b_five
>


Re: [Go] High memory usage on CSV read into table

2018-11-19 Thread Sebastien Binet
hi Daniel,
On Sun, Nov 18, 2018 at 10:17 PM Daniel Harper  wrote:

> Sorry just realised SVG doesn't work.
>
> PNG of the pprof can be found here: https://i.imgur.com/BVXv1Jm.png
>
>
> Daniel Harper
> http://djhworld.github.io
>
>
> On Sun, 18 Nov 2018 at 21:07, Daniel Harper  wrote:
>
> > Wasn't sure where the best place to discuss this, but I've noticed that
> > when running the following piece of code
> >
> > https://play.golang.org/p/SKkqPWoHPPS
> >
> > On a CSV files that contains roughly 1 million records (about 100mb of
> > data), the memory usage of the process leaps to about 9.1GB
> >
> > The records look something like this
> >
> >
> >
> "2018-08-27T20:00:00Z","cdnA","dash","audio","http","programme-1","3577","2018","08","27","2018-08-27","live"
> >
> >
> "2018-08-27T20:00:01Z","cdnB","hls","video","https","programme-2","14","2018","08","27","2018-08-27","ondemand"
> >
> > I've attached a pprof output of the process.
> >
> > From the looks of it the heavy use of _strings_ might be where most of
> the
> > memory is going.
> >
> > Is this expected? I'm new to the code, happy to help where possible!
>

it's somewhat expected.

you use `io.ReadFile` to get your data.
this will read the whole file in memory and stick it there: so there's that.
for much bigger files, I would recommend using `os.Open`.

also, you don't release the individual records once passed to the table, so
you have a memory leak.
here is my current attempt:
- https://play.golang.org/p/ns3GJW6Wx3T

finally, as I was alluding to on the #data-science slack channel, right now
Go arrow/csv will create a new Record for each row in the incoming CSV file.
so you get a bunch of overhead for every row/record.

a much more efficient way would be to chunk `n` rows into a single Record.
an even more efficient way would be to create a dedicated csv.table type
that implements array.Table (as it seems you're interested in using that
interface) but only reads the incoming CSV file piecewise (ie: implementing
the chunking I was alluding to above but w/o having to load the whole
[]Record slice.)

as a first step to improve this issue, implementing chunking would already
shave off a bunch of overhead.

-s


Re: [Go] High memory usage on CSV read into table

2018-11-23 Thread Sebastien Binet
On Mon, Nov 19, 2018 at 11:29 PM Wes McKinney  wrote:

> That seems buggy then. There is only 4.125 bytes of overhead per
> string value on average (a 32-bit offset, plus a valid bit)
> On Mon, Nov 19, 2018 at 5:02 PM Daniel Harper 
> wrote:
> >
> > Uncompressed
> >
> > $ ls -la concurrent_streams.csv
> > -rw-r--r-- 1 danielharper 112M Nov 16 19:21 concurrent_streams.csv
> >
> > $ wc -l concurrent_streams.csv
> >  1007481 concurrent_streams.csv
> >
> >
> > Daniel Harper
> > http://djhworld.github.io
> >
> >
> > On Mon, 19 Nov 2018 at 21:55, Wes McKinney  wrote:
> >
> > > I'm curious how the file is only 100MB if it's producing ~6GB of
> > > strings in memory. Is it compressed?
> > > On Mon, Nov 19, 2018 at 4:48 PM Daniel Harper 
> > > wrote:
> > > >
> > > > Thanks,
> > > >
> > > > I've tried the new code and that seems to have shaved about 1GB of
> memory
> > > > off, so the heap is about 8.84GB now, here is the updated pprof
> output
> > > > https://i.imgur.com/itOHqBf.png
> > > >
> > > > It looks like the majority of allocations are in the
> memory.GoAllocator
> > > >
> > > > (pprof) top
> > > > Showing nodes accounting for 8.84GB, 100% of 8.84GB total
> > > > Showing top 10 nodes out of 41
> > > >   flat  flat%   sum%cum   cum%
> > > > 4.24GB 47.91% 47.91% 4.24GB 47.91%
> > > > github.com/apache/arrow/go/arrow/memory.(*GoAllocator).Allocate
> > > > 2.12GB 23.97% 71.88% 2.12GB 23.97%
> > > > github.com/apache/arrow/go/arrow/memory.NewResizableBuffer (inline)
> > > > 1.07GB 12.07% 83.95% 1.07GB 12.07%
> > > > github.com/apache/arrow/go/arrow/array.NewData
> > > > 0.83GB  9.38% 93.33% 0.83GB  9.38%
> > > > github.com/apache/arrow/go/arrow/array.NewStringData
> > > > 0.33GB  3.69% 97.02% 1.31GB 14.79%
> > > > github.com/apache/arrow/go/arrow/array.(*BinaryBuilder).newData
> > > > 0.18GB  2.04% 99.06% 0.18GB  2.04%
> > > > github.com/apache/arrow/go/arrow/array.NewChunked
> > > > 0.07GB  0.78% 99.85% 0.07GB  0.78%
> > > > github.com/apache/arrow/go/arrow/array.NewInt64Data
> > > > 0.01GB  0.15%   100% 0.21GB  2.37%
> > > > github.com/apache/arrow/go/arrow/array.(*Int64Builder).newData
> > > >  0 0%   100%6GB 67.91%
> > > > github.com/apache/arrow/go/arrow/array.(*BinaryBuilder).Append
> > > >  0 0%   100% 4.03GB 45.54%
> > > > github.com/apache/arrow/go/arrow/array.(*BinaryBuilder).Reserve
> > > >
> > > >
> > > > I'm a bit busy at the moment but I'll probably repeat the same test
> on
> > > the
> > > > other Arrow implementations (e.g. Java) to see if they allocate a
> similar
> > > > amount.
>

I've implemented chunking over there:

- https://github.com/apache/arrow/pull/3019

could you try with a couple of chunking values?
e.g.:
- csv.WithChunk(-1): reads the whole file into memory, creates one big
record
- csv.WithChunk(nrows/10): creates 10 records

also, it would be great to try to disentangle the memory usage of the "CSV
reading part" from the "Table creation" one:
- have some perf numbers w/o storing all these Records into a []Record
slice,
- have some perf numbers w/ only storing these Records into a []Record
slice,
- have some perf numbers w/ storing the records into the slice + creating
the Table.

hth,
-s


Re: [Go] High memory usage on CSV read into table

2018-12-04 Thread Sebastien Binet
On Tue, Dec 4, 2018 at 10:23 PM Daniel Harper  wrote:

> Sorry I've been away at reinvent.
>
> Just tried out what's currently on master (with the chunked change that
> looks like it has merged). I'll do the break down of the different parts
> later but as a high level look at just running the same script as described
> above these are the numbers
>
>
> https://docs.google.com/spreadsheets/d/1SE4S-wcKQ5cwlHoN7rQm7XOZLjI0HSyMje6q-zLvUHM/edit?usp=sharing
>


>
> Looks to me like the change has definitely helped, with memory usage
> dropping to around 300mb, although the usage doesn't really change that
> much once chunk size is > 1000
>

good. you might want to try with a chunk size of -1 (this loads the whole
CSV file into memory in one fell swoop.)

also, there's this PR wich should probably also reduce the memory pressure:
- https://github.com/apache/arrow/pull/3073

cheers,
-s


>
>
>
>
> Daniel Harper
> http://djhworld.github.io
>
>
> On Fri, 23 Nov 2018 at 10:58, Sebastien Binet  wrote:
>
> > On Mon, Nov 19, 2018 at 11:29 PM Wes McKinney 
> wrote:
> >
> > > That seems buggy then. There is only 4.125 bytes of overhead per
> > > string value on average (a 32-bit offset, plus a valid bit)
> > > On Mon, Nov 19, 2018 at 5:02 PM Daniel Harper 
> > > wrote:
> > > >
> > > > Uncompressed
> > > >
> > > > $ ls -la concurrent_streams.csv
> > > > -rw-r--r-- 1 danielharper 112M Nov 16 19:21 concurrent_streams.csv
> > > >
> > > > $ wc -l concurrent_streams.csv
> > > >  1007481 concurrent_streams.csv
> > > >
> > > >
> > > > Daniel Harper
> > > > http://djhworld.github.io
> > > >
> > > >
> > > > On Mon, 19 Nov 2018 at 21:55, Wes McKinney 
> > wrote:
> > > >
> > > > > I'm curious how the file is only 100MB if it's producing ~6GB of
> > > > > strings in memory. Is it compressed?
> > > > > On Mon, Nov 19, 2018 at 4:48 PM Daniel Harper <
> djharpe...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > I've tried the new code and that seems to have shaved about 1GB
> of
> > > memory
> > > > > > off, so the heap is about 8.84GB now, here is the updated pprof
> > > output
> > > > > > https://i.imgur.com/itOHqBf.png
> > > > > >
> > > > > > It looks like the majority of allocations are in the
> > > memory.GoAllocator
> > > > > >
> > > > > > (pprof) top
> > > > > > Showing nodes accounting for 8.84GB, 100% of 8.84GB total
> > > > > > Showing top 10 nodes out of 41
> > > > > >   flat  flat%   sum%cum   cum%
> > > > > > 4.24GB 47.91% 47.91% 4.24GB 47.91%
> > > > > > github.com/apache/arrow/go/arrow/memory.(*GoAllocator).Allocate
> > > > > > 2.12GB 23.97% 71.88% 2.12GB 23.97%
> > > > > > github.com/apache/arrow/go/arrow/memory.NewResizableBuffer
> > (inline)
> > > > > > 1.07GB 12.07% 83.95% 1.07GB 12.07%
> > > > > > github.com/apache/arrow/go/arrow/array.NewData
> > > > > > 0.83GB  9.38% 93.33% 0.83GB  9.38%
> > > > > > github.com/apache/arrow/go/arrow/array.NewStringData
> > > > > > 0.33GB  3.69% 97.02% 1.31GB 14.79%
> > > > > > github.com/apache/arrow/go/arrow/array.(*BinaryBuilder).newData
> > > > > > 0.18GB  2.04% 99.06% 0.18GB  2.04%
> > > > > > github.com/apache/arrow/go/arrow/array.NewChunked
> > > > > > 0.07GB  0.78% 99.85% 0.07GB  0.78%
> > > > > > github.com/apache/arrow/go/arrow/array.NewInt64Data
> > > > > > 0.01GB  0.15%   100% 0.21GB  2.37%
> > > > > > github.com/apache/arrow/go/arrow/array.(*Int64Builder).newData
> > > > > >  0 0%   100%6GB 67.91%
> > > > > > github.com/apache/arrow/go/arrow/array.(*BinaryBuilder).Append
> > > > > >  0 0%   100% 4.03GB 45.54%
> > > > > > github.com/apache/arrow/go/arrow/array.(*BinaryBuilder).Reserve
> > > > > >
> > > > > >
> > > > > > I'm a bit busy at the moment but I'll probably repeat the same
> test
> > > on
> > > > > the
> > > > > > other Arrow implementations (e.g. Java) to see if they allocate a
> > > similar
> > > > > > amount.
> > >
> >
> > I've implemented chunking over there:
> >
> > - https://github.com/apache/arrow/pull/3019
> >
> > could you try with a couple of chunking values?
> > e.g.:
> > - csv.WithChunk(-1): reads the whole file into memory, creates one big
> > record
> > - csv.WithChunk(nrows/10): creates 10 records
> >
> > also, it would be great to try to disentangle the memory usage of the
> "CSV
> > reading part" from the "Table creation" one:
> > - have some perf numbers w/o storing all these Records into a []Record
> > slice,
> > - have some perf numbers w/ only storing these Records into a []Record
> > slice,
> > - have some perf numbers w/ storing the records into the slice + creating
> > the Table.
> >
> > hth,
> > -s
> >
>


[Go] Npyio / Arrow support

2018-12-07 Thread Sebastien Binet
hi there,

Back in the days, for my particle physics work, I had to be able to read
and write numpy data files[1] (for interop with existing analysis
pipelines.)

I was wondering whether I couldn't integrate this Go package with Apache
Arrow:
- https://github.com/sbinet/npyio

and put it under github.com/apache/arrow/go as either
- github.com/apache/arrow/go/npyio, or
- github.com/apache/arrow/go/arrow/npyio (as a sibling of the already
existing csv package.)

that new npyio package would have builtin support for array.Interface and
tensor.Interface.

thoughts?

cheers,
-s

[1]: http://docs.scipy.org/doc/numpy/neps/npy-format.html


Re: Reviewing PRs (was: Re: Arrow sync call)

2018-12-12 Thread Sebastien Binet
On Wed, Dec 12, 2018 at 7:25 PM Antoine Pitrou  wrote:

>
> Hi,
>
> Now that we have a lot of different implementations and a growing number
> of assorted topics, it becomes hard to know whether a PR or issue has a
> dedicated expert or would benefit from an outsider look.
>
> In Python we have what we call the "experts" list which is a per-topic
> (or per-library module) contributors who are generally interested in and
> competent on such topic (*).  So it's possible to cc such a person, or
> if no expert is available on a given topic, perhaps for someone else to
> try and have a look anyway.  Perhaps we need something similar for Arrow?
>

with github, one can also create "teams" and "@" them.
we could perhaps create @arrow-py, @arrow-cxx, @arrow-go, ...
this dilutes a bit responsibilities but also reduces a bit the net that's
cast.

-s


> (*) https://devguide.python.org/experts/
>
> Regards
>
> Antoine.
>
>
>
> Le 12/12/2018 à 19:13, Ravindra Pindikura a écrit :
> > Attendees : Wes, Sidd, Bryan, Francois, Hatem, Nick, Shyam, Ravindra,
> Matt
> >
> > Wes:
> > - do not rush the 0.12 release before the holidays, instead target the
> release for early next year
> > - request everyone to look at PRs in the queue, and help by doing reviews
> >
> > Wes/Nick
> > - queried about Interest in developing a "dataset abstraction" as a
> layer above file readers that arrow now supports (parquet, csv, json)
> >
> > Sidd
> > - agreed to be the release manager for 0.12
> > - things to keep in mind for release managers :
> >  1. We now use crossbow to automate the building of binaries with CI
> >  2. From this release, the binary artifacts will be hosted in bintray
> instead of apache dist since the size has increased significantly
> >
> > Hatem
> > - Asked about documentation regarding IDE for setup/debug of arrow
> libraries
> > - Wes pointed out the developer wiki on confluence. Hatem offered to
> help with documentation.
> >
> > Thanks and regards,
> > Ravindra.
> >
> > On 2018/12/12 16:54:21, Wes McKinney  wrote:
> >> All are welcome to join -- call notes will be posted after>
> >>
> >> https://meet.google.com/vtm-teks-phx>
>


Re: Format specification document?

2019-01-04 Thread Sebastien Binet
Hi,

Theoretically, it's defined there:

- https://arrow.apache.org/docs/ipc.html
- https://arrow.apache.org/docs/metadata.html

hth,
-s


sent from my droid


On Fri, Jan 4, 2019, 02:15 Kohei KaiGai  Hello,
>
> I'm now trying to understand the Apache Arrow format for my application.
> Is there a format specification document including meta-data layout?
>
> I checked out the description at:
> https://github.com/apache/arrow/tree/master/docs/source/format
> https://github.com/apache/arrow/tree/master/format
>
> The format/IPC.rst says an arrow file has the format below:
>
> 
> 
> 
> 
> 
> 
>
> Then, STREAMING FORMAT begins from SCHEMA-message.
> The message chunk has the format below:
>
> 
> 
> 
> 
>
> I made an arrow file using pyarrow [*1]. It has the following binary.
>
> [kaigai@saba ~]$ cat /tmp/sample.arrow | od -Ax -t x1 | head -16
> 00  41 52 52 4f 57 31 00 00 8c 05 00 00 10 00 00 00
> 10  00 00 0a 00 0e 00 06 00 05 00 08 00 0a 00 00 00
> 20  00 01 03 00 10 00 00 00 00 00 0a 00 0c 00 00 00
> 30  04 00 08 00 0a 00 00 00 ec 03 00 00 04 00 00 00
> 40  01 00 00 00 0c 00 00 00 08 00 0c 00 04 00 08 00
> 50  08 00 00 00 08 00 00 00 10 00 00 00 06 00 00 00
> 60  70 61 6e 64 61 73 00 00 b4 03 00 00 7b 22 70 61
> 70  6e 64 61 73 5f 76 65 72 73 69 6f 6e 22 3a 20 22
> 80  30 2e 32 32 2e 30 22 2c 20 22 63 6f 6c 75 6d 6e
> 90  73 22 3a 20 5b 7b 22 6d 65 74 61 64 61 74 61 22
> a0  3a 20 6e 75 6c 6c 2c 20 22 6e 75 6d 70 79 5f 74
> b0  79 70 65 22 3a 20 22 69 6e 74 36 34 22 2c 20 22
> c0  6e 61 6d 65 22 3a 20 22 69 64 22 2c 20 22 66 69
> d0  65 6c 64 5f 6e 61 6d 65 22 3a 20 22 69 64 22 2c
> e0  20 22 70 61 6e 64 61 73 5f 74 79 70 65 22 3a 20
> f0  22 69 6e 74 36 34 22 7d 2c 20 7b 22 6d 65 74 61
>
> The first 64bit is "ARROW1\0\0\0", and the next 32bit is 0x058c (=1420)
> that is reasonable for SCHEMA-message length.
> The next 32bit is 0x0010 (=16). It may be metadata_size of the FlatBuffer.
> The IPC.rst does not mention about FlatBuffer metadata, so I tried to skip
> next 16bytes, expecting message body begins at 0x20.
> However, the first 16bit (version) is 0x0001 (=V2), the next byte is 0x03
> (= RecordBatch, not Schema!), and the following 64bit is
> 0x0a0010(!).
> It is obviously I'm understanding incorrectly.
>
> Is there documentation stuff to introduce detailed layout of the arrow
> format?
>
> Thanks,
>
> [*1] Steps to make a sample arrow file
> $ python3.5
> >>> import pyarrow as pa
> >>> import pandas as pd
> >>> X = pd.read_sql(sql="SELECT * FROM hogehoge LIMIT 1000",
> con="postgresql://localhost/postgres")
> >>> Y = pa.Table.from_pandas(X)
> >>> f = pa.RecordBatchFileWriter('/tmp/sample.arrow', Y.schema)
> >>> f.write_table(Y)
> >>> f.close()
>
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei 
>


Re: [Format] [Rust] ChunkedArray, Column and Table

2019-01-28 Thread Sebastien Binet
On Sun, Jan 27, 2019 at 1:08 PM Neville Dipale 
wrote:

> Hi Antoine,
>
> I've given your response some thought.
>
> I'm thinking more looking at the computational aspect of Arrow. I agree
> that for representing and sharing data, RecordBatches achieve the purpose.
>
> I came across ChunkedArray, Column and Table while I was trying to create a
> dataframe library in Rust. The other languages already benefit from these 3
> already implemented, but for Rust I've had to try create them myself.
> This is what led me to asking the question, because the various languages
> that I've seen so far, seem to follow the same kind of standard re. both
> the structure and methods to create/interact with chunked arrays, columns,
> and tables.
>
> [1] Go Tables:
> https://github.com/apache/arrow/blob/master/go/arrow/array/table.go


there's also this WIP dataframe package being built on top of Arrow:
-  https://github.com/gonum/exp/pull/19

-s


> [2] CPP Tables:
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.cc
> [3] JS Tables: https://github.com/apache/arrow/blob/master/js/src/table.ts
> [4] Ruby:
>
> https://github.com/apache/arrow/blob/master/ruby/red-arrow/lib/arrow/table.rb
> [5] Python, pyarrow.Table
>
> While going through the source, I didn't find anything for Java, and that's
> swayed me to think that maybe Tables don't need standardising as each
> implementation would likely implement them differently (or not implement
> them).
>
> Regards
> Neville
>
> On Fri, 25 Jan 2019 at 20:56, Antoine Pitrou  wrote:
>
> >
> > Hello Neville,
> >
> > I don't know if Tables need standardizing.  Record Batches are part of
> > the spec (*), and they are the basic block for exchanging and sharing
> > tabular data.  Depending on your application, you might exchange a
> > stream of Record Batches, or a fixed-length sequence thereof (in which
> > case you have a "Table").
> >
> > (*) see https://arrow.apache.org/docs/metadata.html
> >
> > (reading that spec though, it's not obvious to me why the Record Batch
> > definition doesn't reference a Schema)
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 25/01/2019 à 19:48, Neville Dipale a écrit :
> > > Hi Arrow developers,
> > >
> > > I've been looking at the various language impls, and although a Table
> > isn't
> > > currently part of the spec, it seems to be implemented in CPP, Python,
> > Go,
> > > JS (and perhaps other languages).
> > >
> > > Are there plans of standardising these and adding them to the spec?
> > >
> > > I'm asking because I'm working on a dataframe implementation for Rust (
> > > https://github.com/nevi-me/rust-dataframe), and I've started trying to
> > > implement columns and tables with the intention to upstream them if I
> get
> > > them right.
> > >
> > > Regards
> > > Neville
> > >
> >
>


Re: Preferred way to cite Apache Arrow?

2019-01-28 Thread Sebastien Binet
what about sending something to the journal of open source software ?
- https://joss.theoj.org/

cheers,
-s


On Mon, Jan 28, 2019 at 4:02 PM Wes McKinney  wrote:

> hi Jim,
>
> We don't have a canonical citation yet. I'd like to write an academic
> paper about the project this year or next, so hopefully this will
> change, but I think you can cite the website in a publication in the
> meantime.
>
> - Wes
>
> On Mon, Jan 28, 2019 at 8:49 AM Jim Pivarski  wrote:
> >
> > Is there a preferred reference (paper, proceedings, Zenodo link) to use
> > when citing Apache Arrow? I couldn't find any on the Arrow website.
> >
> > Thanks,
> > -- Jim
>


Re: Arrow on WebAssembly

2019-02-19 Thread Sebastien Binet
Franco,

On Tue, Feb 19, 2019 at 8:31 PM Brian Hulette  wrote:

> Hi Franco,
> I'm not aware of anyone trying this in Rust, but Tim Paine at JPMC recently
> contributed a patch [1] to make it possible to compile the C++
> implementation with emscripten, so that he could use it in Perspective [2].
> Could you use the C++ lib instead?
>
> It would be great if either implementation could target WebAssembly though
> - do any Rust contributors know more about the libc/wasm issue? Maybe the
> rustwasm community [3] could be of assistance?
>

there's also the Go backend :)
I've just tried compiling this example:
-  https://godoc.org/github.com/apache/arrow/go/arrow#example-package--Table
to wasm.
compilation went fine:

$> GOOS=js GOARCH=wasm go build -o foo.wasm foo.go
$> go-wasm ./foo.wasm
rec[0]["f1-i32"]: [1 2 3 4 5]
rec[0]["f2-f64"]: [1 2 3 4 5]
rec[1]["f1-i32"]: [6 7 8 (null) 10]
rec[1]["f2-f64"]: [6 7 8 9 10]
rec[2]["f1-i32"]: [11 12 13 14 15]
rec[2]["f2-f64"]: [11 12 13 14 15]
rec[3]["f1-i32"]: [16 17 18 19 20]
rec[3]["f2-f64"]: [16 17 18 19 20]

and it ran fine once this patch was added:
- https://github.com/apache/arrow/pull/3707

hth,
-s

PS: go-wasm is an alias of mine for this file:
https://github.com/golang/go/blob/master/misc/wasm/go_js_wasm_exec


> Brian
>
> [1] https://github.com/apache/arrow/pull/3350
> [2] https://github.com/jpmorganchase/perspective
> [3] https://github.com/rustwasm/team
>
> On Tue, Feb 19, 2019 at 11:06 AM Franco Nicolas Bellomo <
> fnbell...@gmail.com>
> wrote:
>
> > Hi!
> >
> > Actually, Apache Arrow have a really nice implementation on Rust. I
> > try to compile this to webAssembly but I have a problem with libc. I
> > understand that this is a general problem of libc and wasm.
> > In the road map of Arrow, you plan support wasm?
> >
> > Thanks!!
> >
>


GSoC 2019 with a bit of Apache Arrow

2019-03-05 Thread Sebastien Binet
Hi there,

Just to let you know CERN has been accepted as a GSoC organization this
year.

As such, I have submitted a proposal that's loosely connected to Apache
Arrow (and Go.)

Here's the proposal:
https://hepsoftwarefoundation.org/gsoc/2019/proposal_GoHEPgroot.html

It's mostly about *using* Arrow, not really developing it (but there's
always some possibility of a "side" PR to land)...

Cheers,
-s

sent from my droid


round-trip tests for Arrow files

2019-04-03 Thread Sebastien Binet
hi there,

I am working on the deserialization support for the Go backend.
at this point, I have (I think) primitive and binary/string arrays working
with a simple Arrow file I created like so:

import pyarrow as pa
data = [
pa.array([1, 2, 3, None, 5], type="i4"),
pa.array(['foo', 'bar', 'baz', None, "quux"]),
pa.array([1, 2, None, 4, 5], type="f4"),
pa.array([True, None, False, True, False])
]

batch = pa.RecordBatch.from_arrays(data, ['f0', 'f1', 'f2', "f3"])
sink = pa.BufferOutputStream()
writer = pa.RecordBatchFileWriter(sink, batch.schema)

for i in range(5):
writer.write_batch(batch)
writer.close()

buf = sink.getvalue()
f = open("out.dat", "wb")
f.write(buf.to_pybytes())
f.close()

and, as I said, I can now successfully read that back from Go.
but I was wondering what's the recommanded way to test for this kind of
round-trip/cross-language thing.

I tried to play a bit with "integration/integration_test.py" but it fails
with:

##
C++ producing, C++ consuming
##
==
Testing file /home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/struct_example.json
==
-- Creating binary inputs
/home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/75bc5ca6_struct_example.json_as_file
--json=/home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/struct_example.json
--mode=JSON_TO_ARROW
Command failed: /home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/75bc5ca6_struct_example.json_as_file
--json=/home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/struct_example.json
--mode=JSON_TO_ARROW
With output:
--
Found schema: struct_nullable: struct

--
==
Testing file /home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/simple.json
==
-- Creating binary inputs
/home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/b0d388ed_simple.json_as_file
--json=/home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/simple.json --mode=JSON_TO_ARROW
Command failed: /home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/b0d388ed_simple.json_as_file
--json=/home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/simple.json --mode=JSON_TO_ARROW
With output:
--
Found schema: foo: int32
bar: double
baz: string

--
==
Testing file testdata/generated_primitive.json
==
-- Creating binary inputs
/home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/8eacb124_generated_primitive.json_as_file
--json=testdata/generated_primitive.json --mode=JSON_TO_ARROW
Command failed: /home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/8eacb124_generated_primitive.json_as_file
--json=testdata/generated_primitive.json --mode=JSON_TO_ARROW
With output:
--
Found schema: bool_nullable: bool
bool_nonnullable: bool not null
int8_nullable: int8
int8_nonnullable: int8 not null
int16_nullable: int16
int16_nonnullable: int16 not null
int32_nullable: int32
int32_nonnullable: int32 not null
int64_nullable: int64
int64_nonnullable: int64 not null
uint8_nullable: uint8
uint8_nonnullable: uint8 not null
uint16_nullable: uint16
uint16_nonnullable: uint16 not null
uint32_nullable: uint32
uint32_nonnullable: uint32 not null
uint64_nullable: uint64
uint64_nonnullable: uint64 not null
float32_nullable: float
float32_nonnullable: float not null
float64_nullable: double
float64_nonnullable: double not null
binary_nullable: binary
binary_nonnullable: binary not null
utf8_nullable: string
utf8_nonnullable: string not null
fixedsizebinary_19_nullable: fixed_size_binary[19]
fixedsizebinary_19_nonnullable: fixed_size_binary[19] not null
fixedsizebinary_120_nullable: fixed_size_binary[120]
fixedsizebinary_120_nonnullable: fixed_size_binary[120] not null

--

is this supposed to work?
are there reference files already available somewhere?

cheers,
-s


Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-24 Thread Sebastien Binet
On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou  wrote:

>
> Hi Areg,
>
> Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit :
> > Because we are using Google Benchmark, which has specific format there
> is a tool called becnhcmp which compares two runs:
> >
> > $ benchcmp old.txt new.txt
> > benchmark   old ns/op new ns/op delta
> > BenchmarkConcat 523   68.6  -86.88%
> >
> > So the comparison part is done and there is no need to create infra for
> that.
>

"surprisingly" Go is already using that benchmark format :)
and (on top of a Go-based benchcmp command) there is also a benchstat
command that, given a set of multiple before/after data points adds some
amount of statistical analysis:
 https://godoc.org/golang.org/x/perf/cmd/benchstat

using the "benchmark" file format of benchcmp and benchstat would allow
better cross-language interop.

cheers,
-s


Re: Arrow as a common open standard for machine learning data

2019-06-16 Thread Sebastien Binet
hi there,

On Sun, Jun 16, 2019 at 6:07 AM Micah Kornfield 
wrote:

> > *  Can Feather files already be read in Java/Go/C#/...?
>
> I don't know the status of feather.   The arrow file format should be
> readable by Java and C++ (I believe all the languages that bind C++ also
> support the format, these include python, ruby and R) .  A quick code
> search of the repo makes me think that there is also support for C#, Rust
> and Javascript. It doesn't look like the file format isn't supported in Go
> yet but it probably wouldn't be too hard to do.
>
Go doesn't handle Feather files.
But there is support (not yet feature complete, see [1]) for Arrow files
(r/w):
-  https://godoc.org/github.com/apache/arrow/go/arrow/ipc

hth,
-s

[1]: https://issues.apache.org/jira/browse/ARROW-3679


Re: Go becomes the 4th language to be a part of the Arrow integration tests (!)

2019-06-24 Thread Sebastien Binet
On Sat, Jun 22, 2019 at 10:08 PM Wes McKinney  wrote:

> I'm excited to announce that Go has become the 4th language to
> officially participate in the Arrow binary protocol integration tests,
> after Java, C++, and JavaScript:
>
>
> https://github.com/apache/arrow/commit/4ba2763150459c9eb4139e5954d9b5526b8ef0ee
>
> This is a huge milestone toward making Go a first-class citizen in the
> Apache Arrow world. Congrats to Sebastien, Stuart, Alexandre, and the
> rest of the Go contributors!
>

as you wrote, it was a team effort :)

we now "just" need to implement Map, Dictionary, Union, Extension and (full
support for) Decimal128 arrays.

-s


Struct support for arrow/go

2018-06-18 Thread Sebastien Binet
hi there,

(apologies if this isn't the correct forum to raise this kind of
question...)

I am investigating whether using arrow/go for my application would be
feasible.
I'd like to be able to provide Arrow support (in Go) for reading ROOT
files[1], a file format used in particle physics.
ROOT provides access to row- and col-wise data, with metadata that allows
files to be self-describing.

after some initial investigation of the feature set of the Go package, it
seems it's missing support for Structs.
does anyone know what amount of work adding it to arrow/go would entail?
(I am a reasonably seasoned Go programmer, but a complete newbie with
Arrow...)

or perhaps somebody is already working on that?

cheers,
-s

[1] https://root.cern


Re: Struct support for arrow/go

2018-06-19 Thread Sebastien Binet
hi Jim,

(happy to see you there!)

On Mon, Jun 18, 2018 at 4:43 PM Jim Pivarski  wrote:

> As far as I know, List> hasn't been implemented yet. These are
> some previous discussions on the Arrow and Parquet mailing lists.
>
>-
>
> http://mail-archives.apache.org/mod_mbox/parquet-dev/201801.mbox/%3C4221159.JjGKrUjAfh@gudok6%3E
>-
>
> https://lists.apache.org/thread.html/45179fd4023764f8303c6dee57a9b65afcb660ab0c5d60842caf30d9@%3Cdev.arrow.apache.org%3E
>-
>
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201805.mbox/%3CCAJEf=x6ouiagp5lnfxfftem_yry0jxazu7ysfv9qh9v14y_...@mail.gmail.com%3E
>
>
thanks for the pointers.
but the Go package, AFAICT, is even missing support for structs (and it's a
pure-Go package, reimplementing things from first principles).
hence my question on the amount of work needed to implement (at least)
support for structs.

-s


Re: [Go] Go failures on Travis-CI

2018-07-12 Thread Sebastien Binet
I'll have a look tomorrow (Paris time).
It looks like a GOPATH issue.

sent from my droid

On Thu, Jul 12, 2018, 20:52 Antoine Pitrou  wrote:

>
> Hello,
>
> I'm getting persistent failures in the Go job on Travis-CI:
> https://travis-ci.org/pitrou/arrow/jobs/403221354
>
> Is this expected?  Excerpt:
>
> """
> $ go get -t -v ./...
> github.com/apache/arrow (download)
> github.com/stretchr/testify (download)
> go/arrow/type_traits_boolean.go:20:2: use of internal package not allowed
> go/arrow/array/array.go:23:2: use of internal package not allowed
> go/arrow/array/array.go:24:2: use of internal package not allowed
> go/arrow/math/math_amd64.go:22:2: use of internal package not allowed
> go/arrow/memory/memory_amd64.go:22:2: use of internal package not allowed
> go/arrow/memory/buffer.go:22:2: use of internal package not allowed
> The command "eval go get -t -v ./... " failed. Retrying, 2 of 3.
> """
>
> Regards
>
> Antoine.
>


[jira] [Created] (ARROW-6146) [Go] implement a Plasma client

2019-08-06 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-6146:
--

 Summary: [Go] implement a Plasma client
 Key: ARROW-6146
 URL: https://issues.apache.org/jira/browse/ARROW-6146
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6147) [Go] implement a Flight client

2019-08-06 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-6147:
--

 Summary: [Go] implement a Flight client
 Key: ARROW-6147
 URL: https://issues.apache.org/jira/browse/ARROW-6147
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6752) [Go] implement Stringer for Null array

2019-10-01 Thread Sebastien Binet (Jira)
Sebastien Binet created ARROW-6752:
--

 Summary: [Go] implement Stringer for Null array
 Key: ARROW-6752
 URL: https://issues.apache.org/jira/browse/ARROW-6752
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7029) [Go] unsafe pointer arithmetic panic w/ Go-1.14-dev

2019-10-30 Thread Sebastien Binet (Jira)
Sebastien Binet created ARROW-7029:
--

 Summary: [Go] unsafe pointer arithmetic panic w/ Go-1.14-dev
 Key: ARROW-7029
 URL: https://issues.apache.org/jira/browse/ARROW-7029
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Sebastien Binet


Go-1.14 (to be released in Feb-2020) has a new analysis pass (enabled with 
-race) that checks for unsafe pointer arithmetic:

~~

go test -race -run=Example_minimal .
--- FAIL: Example_minimal (0.00s)
panic: runtime error: unsafe pointer arithmetic [recovered]
 panic: runtime error: unsafe pointer arithmetic

goroutine 1 [running]:
testing.(*InternalExample).processRunResult(0xcadc80, 0x0, 0x0, 0x8927, 
0x90a400, 0xcb62c0, 0xca48c8)
 /home/binet/sdk/go/src/testing/example.go:89 +0x71f
testing.runExample.func2(0xbf6675c29511646d, 0x20fb5f, 0xc7f780, 0xca2378, 
0xca2008, 0xc86360, 0xcadc80, 0xcadcb0)
 /home/binet/sdk/go/src/testing/run_example.go:58 +0x143
panic(0x90a400, 0xcb62c0)
 /home/binet/sdk/go/src/runtime/panic.go:915 +0x370
github.com/apache/arrow/go/arrow/memory.memory_memset_avx2(0xc9e200, 0x40, 
0x40, 0xc9c000)
 
/home/binet/work/gonum/src/github.com/apache/arrow/go/arrow/memory/memory_avx2_amd64.go:33
 +0xa4
github.com/apache/arrow/go/arrow/memory.Set(...)
 /home/binet/work/gonum/src/github.com/apache/arrow/go/arrow/memory/memory.go:25
github.com/apache/arrow/go/arrow/array.(*builder).init(0xc84600, 0x20)
 
/home/binet/work/gonum/src/github.com/apache/arrow/go/arrow/array/builder.go:101
 +0x23a
github.com/apache/arrow/go/arrow/array.(*Int64Builder).init(0xc84600, 0x20)
 
/home/binet/work/gonum/src/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:102
 +0x60
github.com/apache/arrow/go/arrow/array.(*Int64Builder).Resize(0xc84600, 0x2)
 
/home/binet/work/gonum/src/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:125
 +0x8c
github.com/apache/arrow/go/arrow/array.(*builder).reserve(0xc84600, 0x1, 
0xcad918)
 
/home/binet/work/gonum/src/github.com/apache/arrow/go/arrow/array/builder.go:138
 +0xdc
github.com/apache/arrow/go/arrow/array.(*Int64Builder).Reserve(0xc84600, 
0x1)
 
/home/binet/work/gonum/src/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:113
 +0x68
github.com/apache/arrow/go/arrow/array.(*Int64Builder).Append(0xc84600, 0x1)
 
/home/binet/work/gonum/src/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:60
 +0x46
github.com/apache/arrow/go/arrow_test.Example_minimal()
 /home/binet/work/gonum/src/github.com/apache/arrow/go/arrow/example_test.go:39 
+0x153
testing.runExample(0x94714a, 0xf, 0x95d8a8, 0x957614, 0x83, 0x0, 0x0)
 /home/binet/sdk/go/src/testing/run_example.go:62 +0x275
testing.runExamples(0xcaded8, 0xc7a2e0, 0xb, 0xb, 0x100)
 /home/binet/sdk/go/src/testing/example.go:44 +0x212
testing.(*M).Run(0xc00010, 0x0)
 /home/binet/sdk/go/src/testing/testing.go:1125 +0x3b4
main.main()
 _testmain.go:130 +0x224
FAIL github.com/apache/arrow/go/arrow 0.009s
FAIL

~~

 

see:

[https://groups.google.com/forum/#!msg/golang-dev/SzwDoqoRVJA/IvtnBW5oDwAJ]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7270) [Go] preserve CSV reading behaviour, improve memory usage

2019-11-27 Thread Sebastien Binet (Jira)
Sebastien Binet created ARROW-7270:
--

 Summary: [Go] preserve CSV reading behaviour, improve memory usage
 Key: ARROW-7270
 URL: https://issues.apache.org/jira/browse/ARROW-7270
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7357) [Go] migrate from pkg/errors to x/xerrors

2019-12-09 Thread Sebastien Binet (Jira)
Sebastien Binet created ARROW-7357:
--

 Summary: [Go] migrate from pkg/errors to x/xerrors
 Key: ARROW-7357
 URL: https://issues.apache.org/jira/browse/ARROW-7357
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Sebastien Binet


we should migrate away from `pkg/errors` to `golang.org/x/xerrors` to ensure 
better error handling (and one that is Go-1.13 compatible).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-2964) [Go] wire all currently implemented array types in array.MakeFromData

2018-08-02 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-2964:
--

 Summary: [Go] wire all currently implemented array types in 
array.MakeFromData
 Key: ARROW-2964
 URL: https://issues.apache.org/jira/browse/ARROW-2964
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet


it should be possible to use the `array.MakeFromData` to create values of all 
implemented array types. 

right now, only `arrow.BOOL`, `arrow.INT32`, `arrow.UINT64`, `arrow.INT64` and 
`arrow.FLOAT64` are wired in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3021) support for List

2018-08-08 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3021:
--

 Summary: support for List
 Key: ARROW-3021
 URL: https://issues.apache.org/jira/browse/ARROW-3021
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet


go-arrow should have support for creating List arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3022) support for Struct

2018-08-08 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3022:
--

 Summary: support for Struct
 Key: ARROW-3022
 URL: https://issues.apache.org/jira/browse/ARROW-3022
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet


go-arrow should have support for creating Struct arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3036) [Go] add support for slicing Arrays

2018-08-10 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3036:
--

 Summary: [Go] add support for slicing Arrays
 Key: ARROW-3036
 URL: https://issues.apache.org/jira/browse/ARROW-3036
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3037) [Go] add support NullArray

2018-08-10 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3037:
--

 Summary: [Go] add support NullArray
 Key: ARROW-3037
 URL: https://issues.apache.org/jira/browse/ARROW-3037
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3038) [Go] add support for StringArray

2018-08-10 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3038:
--

 Summary: [Go] add support for StringArray
 Key: ARROW-3038
 URL: https://issues.apache.org/jira/browse/ARROW-3038
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3039) [Go] add support for DictionaryArray

2018-08-10 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3039:
--

 Summary: [Go] add support for DictionaryArray
 Key: ARROW-3039
 URL: https://issues.apache.org/jira/browse/ARROW-3039
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3040) [Go] add support for comparing Arrays

2018-08-10 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3040:
--

 Summary: [Go] add support for comparing Arrays
 Key: ARROW-3040
 URL: https://issues.apache.org/jira/browse/ARROW-3040
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3041) [Go] add support for TimeArray

2018-08-10 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3041:
--

 Summary: [Go] add support for TimeArray
 Key: ARROW-3041
 URL: https://issues.apache.org/jira/browse/ARROW-3041
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3042) [Go] add badge to GoDoc in the Go-Arrow README

2018-08-10 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3042:
--

 Summary: [Go] add badge to GoDoc in the Go-Arrow README
 Key: ARROW-3042
 URL: https://issues.apache.org/jira/browse/ARROW-3042
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3063) [Go] move list of supported/TODO features to confluence

2018-08-16 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3063:
--

 Summary: [Go] move list of supported/TODO features to confluence
 Key: ARROW-3063
 URL: https://issues.apache.org/jira/browse/ARROW-3063
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet


as mentioned in https://github.com/apache/arrow/pull/2421#discussion_r210033779 
we should move the list of supported features (and those that still need to be 
implemented) to confluence.

filing this so we don't forget about it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3130) [Go] add initial support for Go modules

2018-08-28 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3130:
--

 Summary: [Go] add initial support for Go modules
 Key: ARROW-3130
 URL: https://issues.apache.org/jira/browse/ARROW-3130
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet


Go1.11 has been released and provides initial support for Go modules, a new way 
to properly handle (module) version management.

we should start to add some support for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3131) [Go] add test for Go-1.11

2018-08-28 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3131:
--

 Summary: [Go] add test for Go-1.11
 Key: ARROW-3131
 URL: https://issues.apache.org/jira/browse/ARROW-3131
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet


Go-1.11 has been released.

we should start to test this new stable release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3577) [Go] add support for ChunkedArray

2018-10-20 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3577:
--

 Summary: [Go] add support for ChunkedArray
 Key: ARROW-3577
 URL: https://issues.apache.org/jira/browse/ARROW-3577
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3582) CI: Java build is always triggered

2018-10-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3582:
--

 Summary: CI: Java build is always triggered
 Key: ARROW-3582
 URL: https://issues.apache.org/jira/browse/ARROW-3582
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Sebastien Binet


The `JDK: openjdk8 Compiler: gcc C++` build is always triggered, even when 
_e.g._ only Go files are modified:
- https://travis-ci.org/sbinet-gonum/arrow/jobs/444128507



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3584) [Go] add support for Table

2018-10-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3584:
--

 Summary: [Go] add support for Table
 Key: ARROW-3584
 URL: https://issues.apache.org/jira/browse/ARROW-3584
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3612) [Go] implement RecordBatch and RecordBatchReader

2018-10-25 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3612:
--

 Summary: [Go] implement RecordBatch and RecordBatchReader
 Key: ARROW-3612
 URL: https://issues.apache.org/jira/browse/ARROW-3612
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3621) [Go] implement TableBatchReader

2018-10-26 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3621:
--

 Summary: [Go] implement TableBatchReader
 Key: ARROW-3621
 URL: https://issues.apache.org/jira/browse/ARROW-3621
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3622) [Go] implement Schema.Equal

2018-10-26 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3622:
--

 Summary: [Go] implement Schema.Equal
 Key: ARROW-3622
 URL: https://issues.apache.org/jira/browse/ARROW-3622
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3623) [Go] implement Field.Equal

2018-10-26 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3623:
--

 Summary: [Go] implement Field.Equal
 Key: ARROW-3623
 URL: https://issues.apache.org/jira/browse/ARROW-3623
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3625) [Go] add examples for Table, Record and {Table,Record}Reader

2018-10-26 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3625:
--

 Summary: [Go] add examples for Table, Record and 
{Table,Record}Reader
 Key: ARROW-3625
 URL: https://issues.apache.org/jira/browse/ARROW-3625
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3626) [Go] add a CSV TableReader

2018-10-26 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3626:
--

 Summary: [Go] add a CSV TableReader
 Key: ARROW-3626
 URL: https://issues.apache.org/jira/browse/ARROW-3626
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet


assuming ARROW-3621 goes in, it should be relatively straightforward to 
implement a TableReader off a CSV file, using `encoding/csv`.

also drawing inspiration from:
- https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/reader.h
- https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/reader.cc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3627) [Go] add RecordBatchBuilder

2018-10-26 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3627:
--

 Summary: [Go] add RecordBatchBuilder
 Key: ARROW-3627
 URL: https://issues.apache.org/jira/browse/ARROW-3627
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3640) [Go] add support for Tensors

2018-10-27 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3640:
--

 Summary: [Go] add support for Tensors
 Key: ARROW-3640
 URL: https://issues.apache.org/jira/browse/ARROW-3640
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3672) [Go] implement Time32 array

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3672:
--

 Summary: [Go] implement Time32 array
 Key: ARROW-3672
 URL: https://issues.apache.org/jira/browse/ARROW-3672
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3671) [Go] implement Interval array

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3671:
--

 Summary: [Go] implement Interval array
 Key: ARROW-3671
 URL: https://issues.apache.org/jira/browse/ARROW-3671
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3674) [Go] implement Date32 array

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3674:
--

 Summary: [Go] implement Date32 array
 Key: ARROW-3674
 URL: https://issues.apache.org/jira/browse/ARROW-3674
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3675) [Go] implement Date64 array

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3675:
--

 Summary: [Go] implement Date64 array
 Key: ARROW-3675
 URL: https://issues.apache.org/jira/browse/ARROW-3675
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3673) [Go] implement Time64 array

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3673:
--

 Summary: [Go] implement Time64 array
 Key: ARROW-3673
 URL: https://issues.apache.org/jira/browse/ARROW-3673
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3677) [Go] implement FixedSizedBinary array

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3677:
--

 Summary: [Go] implement FixedSizedBinary array
 Key: ARROW-3677
 URL: https://issues.apache.org/jira/browse/ARROW-3677
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3676) [Go] implement Decimal128 array

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3676:
--

 Summary: [Go] implement Decimal128 array
 Key: ARROW-3676
 URL: https://issues.apache.org/jira/browse/ARROW-3676
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3678) [Go] implement Union array

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3678:
--

 Summary: [Go] implement Union array
 Key: ARROW-3678
 URL: https://issues.apache.org/jira/browse/ARROW-3678
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3679) [Go] implement IPC protocol

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3679:
--

 Summary: [Go] implement IPC protocol
 Key: ARROW-3679
 URL: https://issues.apache.org/jira/browse/ARROW-3679
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3680) [Go] implement Float16 array

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3680:
--

 Summary: [Go] implement Float16 array
 Key: ARROW-3680
 URL: https://issues.apache.org/jira/browse/ARROW-3680
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3681) [Go] add benchmarks for CSV reader

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3681:
--

 Summary: [Go] add benchmarks for CSV reader
 Key: ARROW-3681
 URL: https://issues.apache.org/jira/browse/ARROW-3681
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3682) [Go] unexport encoding/csv.Reader from CSV reader

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3682:
--

 Summary: [Go] unexport encoding/csv.Reader from CSV reader
 Key: ARROW-3682
 URL: https://issues.apache.org/jira/browse/ARROW-3682
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet


this would allow to switch to more performant versions of CSV file parsers (if 
any)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3683) [Go] add functional-option style to CSV reader

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3683:
--

 Summary: [Go] add functional-option style to CSV reader
 Key: ARROW-3683
 URL: https://issues.apache.org/jira/browse/ARROW-3683
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet


this will allow to seamlessly handle:
 * header/no-header CSV files
 * comma separator
 * comment character



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3684) [Go] add chunk size option to CSV reader

2018-11-01 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3684:
--

 Summary: [Go] add chunk size option to CSV reader
 Key: ARROW-3684
 URL: https://issues.apache.org/jira/browse/ARROW-3684
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3929) [Go] improve memory usage of CSV reader to improve runtime performances

2018-12-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3929:
--

 Summary: [Go] improve memory usage of CSV reader to improve 
runtime performances
 Key: ARROW-3929
 URL: https://issues.apache.org/jira/browse/ARROW-3929
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3951) [Go] implement a CSV writer

2018-12-07 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3951:
--

 Summary: [Go] implement a CSV writer
 Key: ARROW-3951
 URL: https://issues.apache.org/jira/browse/ARROW-3951
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4689) [Go] add support for WASM

2019-02-26 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-4689:
--

 Summary: [Go] add support for WASM
 Key: ARROW-4689
 URL: https://issues.apache.org/jira/browse/ARROW-4689
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4826) [Go] export Flush method for CSV writer

2019-03-11 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-4826:
--

 Summary: [Go] export Flush method for CSV writer
 Key: ARROW-4826
 URL: https://issues.apache.org/jira/browse/ARROW-4826
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet


it should be possible to flush out to the underlying io.Writer the data that 
has been passed to the arrow/csv.Writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4852) [Go] add shmem allocator

2019-03-13 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-4852:
--

 Summary: [Go] add shmem allocator
 Key: ARROW-4852
 URL: https://issues.apache.org/jira/browse/ARROW-4852
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet


Go-Arrow doesn't implement the IPC protocol yet.

in the meantime, to exchange data with other languages, a nice close gap 
solution would be to have a shmem allocator where one would put Arrow arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5108) [Go] implement reading primitive arrays from Arrow file

2019-04-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5108:
--

 Summary: [Go] implement reading primitive arrays from Arrow file
 Key: ARROW-5108
 URL: https://issues.apache.org/jira/browse/ARROW-5108
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5109) [Go] implement reading binary/string arrays from Arrow file

2019-04-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5109:
--

 Summary: [Go] implement reading binary/string arrays from Arrow 
file
 Key: ARROW-5109
 URL: https://issues.apache.org/jira/browse/ARROW-5109
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5110) [Go] implement reading struct arrays from Arrow file

2019-04-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5110:
--

 Summary: [Go] implement reading struct arrays from Arrow file
 Key: ARROW-5110
 URL: https://issues.apache.org/jira/browse/ARROW-5110
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5111) [Go] implement reading list arrays from Arrow file

2019-04-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5111:
--

 Summary: [Go] implement reading list arrays from Arrow file
 Key: ARROW-5111
 URL: https://issues.apache.org/jira/browse/ARROW-5111
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5112) [Go] implement writing arrays to Arrow file

2019-04-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5112:
--

 Summary: [Go] implement writing arrays to Arrow file
 Key: ARROW-5112
 URL: https://issues.apache.org/jira/browse/ARROW-5112
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5119) [Go] invalid Stringer implementation for array.Boolean

2019-04-04 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5119:
--

 Summary: [Go] invalid Stringer implementation for array.Boolean
 Key: ARROW-5119
 URL: https://issues.apache.org/jira/browse/ARROW-5119
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet


The Stringer implementation of array.Boolean loops over the `[]byte` array 
(which may not have the correct number of entries wrt the bool content) instead 
of using array.Interface.Len.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5172) [Go] implement reading fixed-size binary arrays from Arrow file

2019-04-16 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5172:
--

 Summary: [Go] implement reading fixed-size binary arrays from 
Arrow file
 Key: ARROW-5172
 URL: https://issues.apache.org/jira/browse/ARROW-5172
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5173) [Go] handle multiple concatenated streams back-to-back

2019-04-16 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5173:
--

 Summary: [Go] handle multiple concatenated streams back-to-back
 Key: ARROW-5173
 URL: https://issues.apache.org/jira/browse/ARROW-5173
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5174) [Go] implement Stringer for DataTypes

2019-04-16 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5174:
--

 Summary: [Go] implement Stringer for DataTypes
 Key: ARROW-5174
 URL: https://issues.apache.org/jira/browse/ARROW-5174
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5233) [Go] migrate to new flatbuffers-v0.11.0

2019-04-29 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5233:
--

 Summary: [Go] migrate to new flatbuffers-v0.11.0
 Key: ARROW-5233
 URL: https://issues.apache.org/jira/browse/ARROW-5233
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet


migrating to v0.11.0 improves the generated Go code (better handling of 
booleans and enums)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5246) [Go] use Go-1.12 in CI

2019-04-30 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5246:
--

 Summary: [Go] use Go-1.12 in CI
 Key: ARROW-5246
 URL: https://issues.apache.org/jira/browse/ARROW-5246
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet


we should bump our CI for Go to the new Go-1.12 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5266) [Go] implement read/write IPC for Float16

2019-05-06 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5266:
--

 Summary: [Go] implement read/write IPC for Float16
 Key: ARROW-5266
 URL: https://issues.apache.org/jira/browse/ARROW-5266
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5267) [Go] implement read/write IPC for dictionaries

2019-05-06 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5267:
--

 Summary: [Go] implement read/write IPC for dictionaries
 Key: ARROW-5267
 URL: https://issues.apache.org/jira/browse/ARROW-5267
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5308) [Go] remove deprecated Feather format

2019-05-13 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5308:
--

 Summary: [Go] remove deprecated Feather format
 Key: ARROW-5308
 URL: https://issues.apache.org/jira/browse/ARROW-5308
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet


we should probably consider removing the feather format files from the Go 
backend.

Feather is deprecated and right now the Go implementation is just the result of 
the automatically generated code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5383) [Go] update IPC flatbuf (new Duration type)

2019-05-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5383:
--

 Summary: [Go] update IPC flatbuf (new Duration type)
 Key: ARROW-5383
 URL: https://issues.apache.org/jira/browse/ARROW-5383
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5384) [Go] add FixedSizeList array

2019-05-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5384:
--

 Summary: [Go] add FixedSizeList array
 Key: ARROW-5384
 URL: https://issues.apache.org/jira/browse/ARROW-5384
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5385) [Go] implement EXTENSION datatype

2019-05-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5385:
--

 Summary: [Go] implement EXTENSION datatype
 Key: ARROW-5385
 URL: https://issues.apache.org/jira/browse/ARROW-5385
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5387) [Go] properly handle sub-slice of List

2019-05-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5387:
--

 Summary: [Go] properly handle sub-slice of List
 Key: ARROW-5387
 URL: https://issues.apache.org/jira/browse/ARROW-5387
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet


consider an `array.List` with the following content:

`[[0 1 2] (null) [3 4 5 6]]`

 

sub-slicing it with `array.NewSlice(arr, 1, 3)`, we get:

`[(null) []]` instead of `[(null) [3 4 5 6]]`

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5388) [Go] use arrow.TypeEqual in array.NewChunked

2019-05-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5388:
--

 Summary: [Go] use arrow.TypeEqual in array.NewChunked
 Key: ARROW-5388
 URL: https://issues.apache.org/jira/browse/ARROW-5388
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5459) [Go] implement Stringer for Float16 (array+dtype)

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5459:
--

 Summary: [Go] implement Stringer for Float16 (array+dtype)
 Key: ARROW-5459
 URL: https://issues.apache.org/jira/browse/ARROW-5459
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5462) [Go] support writing zero-length List

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5462:
--

 Summary: [Go] support writing zero-length List
 Key: ARROW-5462
 URL: https://issues.apache.org/jira/browse/ARROW-5462
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5468) [Go] implement read/write IPC for Timestamp arrays

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5468:
--

 Summary: [Go] implement read/write IPC for Timestamp arrays
 Key: ARROW-5468
 URL: https://issues.apache.org/jira/browse/ARROW-5468
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5467) [Go] implement read/write IPC for Time32/Time64 arrays

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5467:
--

 Summary: [Go] implement read/write IPC for Time32/Time64 arrays
 Key: ARROW-5467
 URL: https://issues.apache.org/jira/browse/ARROW-5467
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5469) [Go] implement read/write IPC for Date32/Date64 arrays

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5469:
--

 Summary: [Go] implement read/write IPC for Date32/Date64 arrays
 Key: ARROW-5469
 URL: https://issues.apache.org/jira/browse/ARROW-5469
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5493) [Integration/Go] add Go support for IPC integration tests

2019-06-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5493:
--

 Summary: [Integration/Go] add Go support for IPC integration tests
 Key: ARROW-5493
 URL: https://issues.apache.org/jira/browse/ARROW-5493
 Project: Apache Arrow
  Issue Type: Test
  Components: Go, Integration
Reporter: Sebastien Binet


it would be great to add support for the cross-language integration tests of 
the IPC file/stream format:

- [https://github.com/apache/arrow/tree/master/integration]

- [https://github.com/apache/arrow/blob/master/integration/integration_test.py]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5551) [Go] invalid FixedSizeArray representation

2019-06-11 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5551:
--

 Summary: [Go] invalid FixedSizeArray representation
 Key: ARROW-5551
 URL: https://issues.apache.org/jira/browse/ARROW-5551
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet


FixedSizeArrays are currently represented as 3-buffers data.

but the C++ definition expects a 2-buffers data layout (as all the primitive 
arrays.)

(uncovered while trying to roundtrip all "integration" tests.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >