Thanks Alvaro! rows looks top-notch, I'll check it out! I too have support for extracting tables from images on my roadmap, will drop by the rows gitter channel to discuss this further! :)
On Sat, Sep 29, 2018 at 1:40 AM Álvaro Justen [Turicas] < alvarojus...@gmail.com> wrote: > Hi, Vinayak! Good work, thanks for sharing. :) > > I'm the creator of the rows library[http://turicas.info/rows] and > implemented PDF support early this year (with 3 different strategies) > -- it's not released on PyPI yet since I'm fixing some bugs before > releasing the next version, but you can try it out by installing: > > pip install > git+https://github.com/turicas/rows.git@feature/plugin-pdf#egg=rows > pdfminer.six cached-property > > It's 100% written in Python and also has a command-line interface (so > you can run `rows convert http://example.com/file.pdf > newfile.(csv|xls|xlsx|html|sqlite)` or even `rows query "SELECT * FROM > table1 WHERE some_condition" http://example.com/file.pdf > --output=result.xls`). > > The idea behind the extraction algorithms is to be flexible, so you > can plug your own if you want (depending on how the PDF is created, > the objects will be very different and you cannot use the same > ordering/grouping strategy). > > I'm now implementing support to extract tables from images (and also > from PDFs with images), but it's probably not going to the next > version since I need a better OCR tool. What do you think in joining > efforts so we can have better libraries? I'm going to test the PDFs > you've cited with my code so we can compare better. Feel free to > contact me directly or join the chat at https://gitter.im/turicas/rows > > Cheers, > Álvaro Justen "Turicas" > turicas.info / @turicas (twitter, github, youtube) > +55 41 999 311 221 > On Fri, Sep 28, 2018 at 11:43 AM Vinayak Mehta <vmeht...@gmail.com> wrote: > > > > Hello everyone! > > > > I recently released a Python library which lets users extract data > tables out of PDF files, my first open source library! Here's the link: > https://github.com/socialcopsdev/camelot > > > > I've created a wiki page comparing it to other open source PDF table > extraction tools. I'm currently working on porting it to Python3! > > > > I would be really grateful if you could check it out and see if its > useful to you and give me any feedback that may help me improve it, by > replying here, opening an issue or a pull request! > > > > Looking forward to hearing from you all! > > > > Thanks for your time! > > > > Vinayak > > _______________________________________________ > > PSF-Community mailing list > > PSF-Community@python.org > > https://mail.python.org/mailman/listinfo/psf-community >
_______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community