Thats odd, Factor's "which" just looks in the $PATH for your executable.

    IN: scratchpad "PATH" os-env

You can read a bit about how its implemented cross-platform:

    http://re-factor.blogspot.com/2013/01/which.html


On Sat, Feb 8, 2014 at 2:30 PM, CW Alston <cwalsto...@gmail.com> wrote:

> Thanks for the replies. Maybe a clue here - I get this from "which":
>
> IN: scratchpad USE: tools.which
> IN: scratchpad "docsplit" which .
> f
> IN: scratchpad "couchdb" which .
> f
> IN: scratchpad "ruby" which  .
> f
>
> Whereas in the terminal:
>
> ➜  ~ git:(master) ✗ which docsplit
> /usr/local/opt/ruby/bin/docsplit
>
> ➜  ~ git:(master) ✗ which couchdb
> /usr/local/bin/couchdb
>
> ➜  ~ git:(master) ✗ which ruby
> /usr/local/bin/ruby
>
> Let me try moving up to the most recent development release
> & see if the problem disappears. I'll get back to you.
>
> Best,
> ~cw
>
>
>
> On Sat, Feb 8, 2014 at 7:42 AM, John Benediktsson <mrj...@gmail.com>wrote:
>
>> Well if you want process output, you can do something like:
>>
>>     { "docsplit" "text" "--no-clean" "-l" "path" } utf8 [ lines ]
>> with-process-reader
>>
>> or without output, using a single command string:
>>
>>     "docsplit text --no-clean -l path" run-process drop
>>
>> You can docsplit a directory of files:
>>
>>     : docsplit ( file -- )
>>         { "docsplit" "text" "--no-clean" "-l" }
>>         swap prefix run-process drop ;
>>
>>     : docsplit-all ( path -- )
>>         directory-files [ docsplit ] each ;
>>
>> And concatenate all the files in a directory:
>>
>>     # bash
>>     ls *.factor | sort | xargs -I '{}' cat '{}'
>>
>>     # factor
>>     : cat-results ( path -- )
>>         directory-files [ ".txt" tail? ] filter natural-sort
>>         [ file-lines ] map concat ;
>>
>> Or something like that, which part are you having problems with?
>>
>> Best,
>> John.
>>
>>
>>
>> On Sat, Feb 8, 2014 at 2:32 AM, CW Alston <cwalsto...@gmail.com> wrote:
>>
>>> Hi folks -
>>>
>>> I am thrilled to find a versatile open-source optical character
>>> recognition
>>> engine called docsplit <http://documentcloud.github.io/docsplit/>. I've
>>> got it installed easily as a ruby gem, & it works
>>> just great on my Mac as a shell command (it also provides a ruby module):
>>>
>>> ➜  ~ git:(master) ✗ which docsplit
>>> /usr/local/opt/ruby/bin/docsplit
>>> ➜  ~ git:(master) ✗
>>>
>>> I need such a tool to extract text from a deep directory tree, with a
>>> couple thousand
>>> folders. Each leaf folder contains 3-6 scanned pdfs (in Chinese &
>>> English), from which
>>> docsplit makes a plaintext (.txt) file with the same basename, deposited
>>> in the same
>>> leaf directory. My Factor vocab can easily visit each leaf dir & prepare
>>> to pass each pdf
>>> there to docsplit in the format it happily handles in the terminal (I
>>> use oh-my-zsh & iTerm2).
>>> My Factor code chokes on this intermediate step, trying to call docsplit.
>>>
>>> Going to the terminal, I have to first cd to the directory containing
>>> the pdfs, e.g.,
>>>
>>> ➜  ~ git:(master) ✗ cd /path/to/1_long_gu
>>>
>>> then call docsplit with the appropriate flags on each pdf:
>>>
>>> ➜  1_long_gu git:(master) ✗ docsplit text --no-clean -l chi_sim
>>> long_gu001.pdf
>>> ➜  1_long_gu git:(master) ✗ docsplit text --no-clean -l eng
>>> long_gu002.pdf
>>>
>>> etc., for each pdf, & docsplit gives back a bunch of text files in the
>>> dir like
>>>
>>> /path/to/1_long_gu/long_gu001.txt
>>>
>>> In the terminal, even a compound phrase like the following works without
>>> a hitch:
>>>
>>> ➜  ~ git:(master) ✗ cd /path/to/1_long_gu ; docsplit text --no-clean -l
>>> chi_sim long_gu001.pdf ; docsplit text --no-clean -l eng long_gu002.pdf ;
>>> docsplit text --no-clean -l eng long_gu003.pdf ;...
>>> ➜  1_long_gu git:(master) ✗
>>>
>>> So, working from the terminal, I wind up with a series of text files in
>>> /path/to/1_long_gu
>>> that my Factor vocab amalgamates into a single text file (with
>>> whitespace in filename), e.g.,
>>> /path/to/1_long_gu/long gu.txt, which I can edit for mistakes, and
>>> upload to a couchdb database.
>>> Joy!
>>>
>>> But I haven't been able to work out how to accomplish this docsplit call
>>> from Factor code.
>>> I have no problem traversing the directory tree (Factor's word each-file
>>> & the like come in
>>> very handy). I've experimented with io.launcher, io.pipes, shell scripts
>>> (bash, zsh, factor),
>>> & autoload shell functions, but flunked out. No errors with io.launcher
>>> tries; just no result.
>>> Need to learn something here. I routinely launch couchdb as a detached
>>> <process>.
>>>
>>> It would be such a boon to use docsplit in Factor. After a couple weeks
>>> lost at sea with this,
>>> I'm broadcasting a Mayday. Any suggestions?
>>>
>>> Thanks in advance,
>>> ~cw
>>>
>>> --
>>> *~ Memento Amori*
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Managing the Performance of Cloud-Based Applications
>>> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
>>> Read the Whitepaper.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Factor-talk mailing list
>>> Factor-talk@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/factor-talk
>>>
>>>
>>
>
>
> --
> *~ Memento Amori*
>
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to