Re: [O] Extract document structure from Org file

John Kitchin Fri, 03 Jul 2015 07:21:08 -0700

That sounds really cool. I recently hacked a swish-e index of my org
files (there might have been 3000+!)
http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/.
and


I just updated it to index the html version of an org-file so that I
take advantage of the structure in the
search. 
http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/.
 It
would be cool to have more granular searching though.

Is your info project visible
anywhere? i can imagine a close-file hook function that updates the
database automatically.

Oleg Sivokon writes:

> Hello list!
>
> Suppose I wanted to extract the structure from an Org document, where,
> what's important for me would be to have it cathegorically divided into
> headers, paragraphs of text, technical information and inclusion of
> other documents (code snippets).  How would I do it?
>
> The reason I'm asking is that I've a small project I work on, where I'm
> trying to enhance the search in documents by using indexing combined
> with queries based on things like distance between words, frequency of a
> word appearing in a document and so on.  (I'm using Sphinx for it.)
> I've tried to do this with Info pages, and I liked the results, however,
> in order to do this more intelligently, I'd like to index the documents
> with better granularity (i.e. so that later on I could search assigning
> different weights to words appearing in headers and words appearing in
> comments).
>
> Best.
>
> Oleg

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

Re: [O] Extract document structure from Org file

Reply via email to