The library's API is pretty simple and intuitive too! You can check it out in the README :)
On Sat, Sep 29, 2018 at 1:06 AM Vinayak Mehta <vmeht...@gmail.com> wrote: > Hello David! > > Yes, I've created a wiki page comparing Camelot with other open source > tools and libraries. tabula-py is a wrapper over tabula-java, which is used > by Tabula. You can check out the comparison of Camelot with Tabula here > <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools#tabula>. > As you can see in the comparison, it outperforms Tabula in almost all cases! > > While Tabula either gives either good output or fails miserably, Camelot > gives you complete control over the extraction process with various > configuration parameters! You can check out this section of the README > <https://github.com/socialcopsdev/camelot#why-camelot> for more > information. Camelot also lets you plot various geometries like detected > lines, intersections, tables in the PDF to debug and improve table > extraction! You can check out this part of the documentation > <https://camelot-py.readthedocs.io/en/latest/user/advanced.html#plot-geometry> > for more information on that. > > Try it out! > > Vinayak > > On Sat, Sep 29, 2018 at 12:34 AM David Mertz <me...@gnosis.cx> wrote: > >> Have you compared your tool with existing ones, such as >> https://blog.chezo.uno/tabula-py-extract-table-from-pdf-into-python-dataframe-6c7acfa5f302 >> ? >> >> What notable difference in API and/or accuracy do you have? >> >> On Fri, Sep 28, 2018 at 2:32 PM Vinayak Mehta <vmeht...@gmail.com> wrote: >> >>> I've created a Jupyter notebook which shows an example of how Camelot makes >>> it easy to extract tables out of PDFs. >>> >>> >>> In the example, I scrape a PDF from an Indian disease outbreaks data >>> source[1] using requests, extract tables from >>> each page of the PDF using Camelot and then concat those tables. Here's the >>> gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 >>> :) >>> >>> [1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689 >>> >>> >>> On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta <vmeht...@gmail.com> >>> wrote: >>> >>>> Hello everyone! >>>> >>>> I recently released a Python library which lets users extract data >>>> tables out of PDF files, my first open source library! Here's the link: >>>> https://github.com/socialcopsdev/camelot >>>> >>>> I've created a wiki page >>>> <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> >>>> comparing it to other open source PDF table extraction tools. I'm currently >>>> working on porting it to Python3! >>>> >>>> I would be really grateful if you could check it out and see if its >>>> useful to you and give me any feedback that may help me improve it, by >>>> replying here, opening an issue or a pull request! >>>> >>>> Looking forward to hearing from you all! >>>> >>>> Thanks for your time! >>>> >>>> Vinayak >>>> >>> _______________________________________________ >>> PSF-Community mailing list >>> PSF-Community@python.org >>> https://mail.python.org/mailman/listinfo/psf-community >>> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> >
_______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community