Re: [Pharo-users] LiteratureResearcher - where graphs, PDFs, and BibTex happily live together

Manuel Leuenberger Thu, 09 Nov 2017 06:40:01 -0800

Hi everyone,

The estimation of packaging everything over the weekend was overly optimistic. 
There were just too many issues with portability and dependencies, leading to a 
long chain of installation requirements. Nevertheless, I decided to publish 
what I have so far, maybe some of you have more experience in making Pharo 
applications like this portable, especially Python is giving me a headache 
(literally).


So, here are two source repositories and a all-in-one package, which may work 
for some of you (still macOS only):

LiteratureResearcher sources https://github.com/maenu/LiteratureResearcher 
<https://github.com/maenu/LiteratureResearcher> contains the main sources and 
make scripts. Installation through Metacello should make the project, takes a 
while.
PharoUriScheme source https://github.com/maenu/PharoUriScheme 
<https://github.com/maenu/PharoUriScheme> contains the sources for the pharo:// 
protocol wrapper. This is mainly an Xcode project, which introduces the 
platform dependency. This project can also be extended to support Linux and 
Windows.
The all-in-one package 
https://figshare.com/articles/LiteratureResearcher_All-in-one/5584837 
<https://figshare.com/articles/LiteratureResearcher_All-in-one/5584837> 
contains the LiteratureResearcher, still requires Java, Python 2.7, and 
virtualenv (shame on you, Python). Might work for some.

Comments and contributors are very welcome, post issues, fork, create PRs, 
extend!

Cheers,
Manuel

> On 2 Nov 2017, at 20:33, Manuel Leuenberger <leuenber...@inf.unibe.ch> wrote:
> 
> Hi Stef,
> 
> The PDF integration consists of three parts:
> 
> 1. CERMINE (https://github.com/CeON/CERMINE 
> <https://github.com/CeON/CERMINE>) is fed with the PDF and outputs metadata 
> as BibTex and a structured XML (title, authors, affiliations, abstract, 
> keyword, references, …). This is not perfect, but way better than any other 
> metadata extractor I could find.
> 2. From the metadata I generate hyperlinks that are anchored in the PDF by a 
> text key. pdf-linker (https://github.com/maenu/pdf-linker 
> <https://github.com/maenu/pdf-linker>) then searches for the anchors in the 
> PDF text, using heuristics, as PDF has a document model that is primarily 
> intended for rendering and printing, but not for processing. The hyperlinks 
> are then inserted using the awesome Apache PDFBox (https://pdfbox.apache.org/ 
> <https://pdfbox.apache.org/>).
> 3. Those hyperlinks point to an URI like 
> “pharo://handle/clickReference.in.?args=1&args=2 
> <pharo://handle/clickReference.in.?args=1&args=2>” to represent a reference 1 
> in the paper 2. Now comes the magic part: The OS allows you to register 
> custom handlers for custom URI schemes like pharo://. For that I created a 
> simple Objective-C app that handles the event and passes it over as a HTTP 
> message to a server running in Pharo (https://github.com/maenu/PharoUriScheme 
> <https://github.com/maenu/PharoUriScheme>). The OS will even start the 
> application if it is not yet running.
> 
> While the custom URI scheme approach is super powerful, it has critical 
> drawbacks. Any application can request to be the receiver of a URI scheme, 
> just as browser are for http://. Especially on mobile devices with limited 
> access to the OS, this opens up an attack point for malware apps that 
> replicate original apps that make use of schemes like facebook:// and 
> eavesdrop all interactions. If an original app transmits any unencrypted 
> secrets or user data encoded in those URIs, malware can easily intercept it 
> without the user noticing the leak. I guess this is the reason why many PDF 
> viewer just support the standard http:// and mailto:// schemes. E.g., macOS 
> Preview gives just an audible beep when I click on a pharo:// link, Chromes 
> viewer doesn’t even bother giving any feedback. Only Adobe Acrobat allows you 
> to relax security settings to make them work (How could it be someone else 
> than Adobe, when it’s a security issue? ;).
> 
> I finished basic packaging today and will continue with some READMEs and a 
> nearly-all-in-one distribution tomorrow, I’ll keep you posted in this thread.
> 
> Cheers,
> Manuel
> 
>> On 2 Nov 2017, at 18:08, Stephane Ducasse <stepharo.s...@gmail.com 
>> <mailto:stepharo.s...@gmail.com>> wrote:
>> 
>> Hi manuel
>> 
>> this is super cool :)
>> Could you describe how you did the pdf integration?
>> And yes please package it :)
>> I want to try it.
>> 
>> Stef
>> 
>> On Wed, Nov 1, 2017 at 10:16 PM, Manuel Leuenberger
>> <leuenber...@inf.unibe.ch <mailto:leuenber...@inf.unibe.ch>> wrote:
>>> Hi everyone,
>>> 
>>> I was experimenting in the last few weeks with my take on literature
>>> research. For me, the corpus of scientific papers form an interconnected
>>> graph, not those plain lists and tables we keep in our bibliographies. So,
>>> here is the first prototype that has Google Scholar integration for search,
>>> can fetch PDFs from IEEE and ACM, extracts metadata from PDFs - all this
>>> results in hyperlinked PDFs!
>>> 
>>> See a demo here: https://youtu.be/EcK3Pt_WnEw <https://youtu.be/EcK3Pt_WnEw>
>>> Also slides from the SCG seminar here:
>>> http://scg.unibe.ch/download/softwarecomposition/2017-10-31-Leuenberger-ILE.pdf
>>>  
>>> <http://scg.unibe.ch/download/softwarecomposition/2017-10-31-Leuenberger-ILE.pdf>
>>> 
>>> I plan on packaging it, so that those who are interested can check it out
>>> themselves (help wanted!). Currently, it only works on macOS.
>>> 
>>> What do you think of my approach? Which use cases should be added?
>>> 
>>> Cheers,
>>> Manuel
>>> 
>> 
>

Re: [Pharo-users] LiteratureResearcher - where graphs, PDFs, and BibTex happily live together

Reply via email to