Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Vincent Lefevre
Hi,

On 2008-05-21 11:25:25 -0300, Nelson A. de Oliveira wrote:
> Vincent, is it there a newer version of sxw2txt available, please? (or
> the one that I found is the latest one?)

I use my own version available on my web pages:

http://www.vinc17.org/unix/index.en.html#sxw2txt
http://www.vinc17.org/software/sxw2txt (for the Perl script itself).

-- 
Vincent Lefèvre <[EMAIL PROTECTED]> - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Nelson A. de Oliveira
Hi!

On Wed, 21 May 2008 16:47:15 +0200
Michael Biebl <[EMAIL PROTECTED]> wrote:
> I was not entirely right. My foo.sxw document must be older.
> I opened it in OOo 2.4 and SO 7 and saved it under a different name
> (as sxw) and now I can convert it with odt2txt
> 
> The only difference I could easily spot, is that the new sxw archive 
> contains a file called "mimetype" now, while the original foo.sxw
> doesn't.

So it seems that we will need sxw2txt, since odt2txt won't convert some
old .sxw files, right?

Best regards,
Nelson


signature.asc
Description: PGP signature


Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Michael Biebl

David wrote:

Any other alternatives?



What about asking the maintainers of recoll and beagle about the solution
they use?

I do not use beagle because it devours memory, but I think it reads
OpenOffice. Regarding recoll, it also reads OpenOffice, is very nice 
and is much more featured than tracker, because it allows searching by
chain, stem expansion, etc; at the end of the day, it is a KDE application,
and KDE users do not mind being "confused" by features :-D  :-D


Never mind, I'm using both GNOME and KDE ;-)



I hope you have humour :-D , but the first sentence is a real suggestion
(investigating how beagle and recoll make it).


recoll seems to use some gross awk/sed/shell mix. Definitely not 
something I want to copy and maintain. (I'm also not sure if it handles 
all subletities of the ODF format correctly).


Cheers,
Michael

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Michael Biebl

Nelson A. de Oliveira wrote:

Hi!

On Wed, May 21, 2008 at 11:36 AM, Michael Biebl <[EMAIL PROTECTED]> wrote:

Then I must be doing something wrong:

$ odt2txt foo.sxw
Can't read from foo.sxw: Is it an OpenDocument Text?

foo.sxw bein a document created with StarOffice 7.0 (i.e. OpenOffice 1.x)


Here is one example:
http://people.debian.org/~naoliv/misc/478091.sxw

It was created using OpenOffice 2.4 however (are there differences in
creating .sxw using differente OO versions?)


I was not entirely right. My foo.sxw document must be older.
I opened it in OOo 2.4 and SO 7 and saved it under a different name (as 
sxw) and now I can convert it with odt2txt


The only difference I could easily spot, is that the new sxw archive 
contains a file called "mimetype" now, while the original foo.sxw doesn't.


Cheers,
Michael

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Nelson A. de Oliveira
Hi!

On Wed, May 21, 2008 at 11:36 AM, Michael Biebl <[EMAIL PROTECTED]> wrote:
> Then I must be doing something wrong:
>
> $ odt2txt foo.sxw
> Can't read from foo.sxw: Is it an OpenDocument Text?
>
> foo.sxw bein a document created with StarOffice 7.0 (i.e. OpenOffice 1.x)

Here is one example:
http://people.debian.org/~naoliv/misc/478091.sxw

It was created using OpenOffice 2.4 however (are there differences in
creating .sxw using differente OO versions?)

Best regards,
Nelson



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Michael Biebl

Nelson A. de Oliveira wrote:

Hi again!

odt2txt (from the odt2txt package) seems to extract text from sxw (at
least I think that I am correctly testing it here; file says that the
document is "OpenOffice.org 1.x Writer document").

The final result isn't the same when using sxw2txt (some formatting
differences), but it works.
Can somebody confirm this, please?


Then I must be doing something wrong:

$ odt2txt foo.sxw
Can't read from foo.sxw: Is it an OpenDocument Text?

foo.sxw bein a document created with StarOffice 7.0 (i.e. OpenOffice 1.x)

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Nelson A. de Oliveira
Hi again!

odt2txt (from the odt2txt package) seems to extract text from sxw (at
least I think that I am correctly testing it here; file says that the
document is "OpenOffice.org 1.x Writer document").

The final result isn't the same when using sxw2txt (some formatting
differences), but it works.
Can somebody confirm this, please?

Thank you!

Best regards,
Nelson



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread David
>
> Any other alternatives?
>
>
What about asking the maintainers of recoll and beagle about the solution
they use?

I do not use beagle because it devours memory, but I think it reads
OpenOffice. Regarding recoll, it also reads OpenOffice, is very nice 
and is much more featured than tracker, because it allows searching by
chain, stem expansion, etc; at the end of the day, it is a KDE application,
and KDE users do not mind being "confused" by features :-D  :-D

I hope you have humour :-D , but the first sentence is a real suggestion
(investigating how beagle and recoll make it).

Thank you,

David


Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Nelson A. de Oliveira
Hi people!

On Wed, May 21, 2008 at 10:14 AM, Michael Biebl <[EMAIL PROTECTED]> wrote:
>> Unfortunately, unoconv needs the complete openoffice.org up and running
>> (and an X server to host it). This may be a bit heavy for an indexing
>> solution.
>
> Hm, yeah, that's indeed a bit heavy weight ;-)
>
> Any other alternatives?

There is sxw2txt [1][2], that one day  was part of an ITP [3].
It looks very simple and from my tests here, it works properly.

(I am CCing Vincent Lefevre, who did some work on it.)
Vincent, is it there a newer version of sxw2txt available, please? (or
the one that I found is the latest one?)

I don't know if it's necessary to have a package just for a simple
Perl script (maybe it can be included inside another package,
odt2txt?), but we can think on a solution for this problem.

[1] 
http://www.sfr-fresh.com/unix/misc/lesspipe-1.53.tar.gz:a/lesspipe-1.53/sxw2txt
[2] http://www-zeuthen.desy.de/~friebel/unix/less/sxw2txt
[3] http://bugs.debian.org/281351

Best regards,
Nelson



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Michael Biebl

Vincent Bernat wrote:



On Wed, 21 May 2008 14:11:19 +0200, "Rene Engelhard"
<[EMAIL PROTECTED]> wrote:


Seems like odt2txt can't handle OpenOffice 1.x documents, only ODF.

As the name already says. *odt*. ;-) Otherwise it'd say sxw2txt ;-)
What about unoconv as an alternative? (Ccing the unoconv maintainer)


Unfortunately, unoconv needs the complete openoffice.org up and running
(and an X server to host it). This may be a bit heavy for an indexing
solution.


Hm, yeah, that's indeed a bit heavy weight ;-)

Any other alternatives?


--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Vincent Bernat
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
X-Priority: 5 (Lowest)
Message-ID: <[EMAIL PROTECTED]>
X-Sender: [EMAIL PROTECTED]
Received: from 193.252.118.2 [193.252.118.2] with HTTP/1.1 (POST); Wed, 21 May
2008 14:51:13 +0200
User-Agent: RoundCube Webmail/0.1
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit


On Wed, 21 May 2008 14:11:19 +0200, "Rene Engelhard"
<[EMAIL PROTECTED]> wrote:

>> Seems like odt2txt can't handle OpenOffice 1.x documents, only ODF.
> 
> As the name already says. *odt*. ;-) Otherwise it'd say sxw2txt ;-)
> What about unoconv as an alternative? (Ccing the unoconv maintainer)

Unfortunately, unoconv needs the complete openoffice.org up and running
(and an X server to host it). This may be a bit heavy for an indexing
solution.




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Michael Biebl

Rene Engelhard wrote:

Hi,


Seems like odt2txt can't handle OpenOffice 1.x documents, only ODF.


As the name already says. *odt*. ;-) Otherwise it'd say sxw2txt ;-)
What about unoconv as an alternative? (Ccing the unoconv maintainer)


Interesting.

What are the ups and downs comparing unoconv / odt2txt?

Cheers,
Michael


--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Rene Engelhard
Hi,

> Seems like odt2txt can't handle OpenOffice 1.x documents, only ODF.

As the name already says. *odt*. ;-) Otherwise it'd say sxw2txt ;-)
What about unoconv as an alternative? (Ccing the unoconv maintainer)

Regards,

Rene



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-05-21 Thread Michael Biebl

David wrote:

Package: tracker
Version: 0.6.6-1+b1
Severity: serious

--- Please enter the report below this line. ---

tracker uses o3read to extract the text of OpenOffice documents. In fact,
o3read is in the recommendation list of tracker.

Nevertheless, o3read has been dropped from the repository in favour of
odt2txt: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=477311(furthermore,
o3read does not work with the new OpenOffice format).



Seems like odt2txt can't handle OpenOffice 1.x documents, only ODF.

This would be a regression to o3read, which can handle both.

I don't know, to what degree o3read supports ODF, at least
unzip -p file.odt content.xml | o3totxt
seems to work just fine.

So, odt2txt is no real substitute unfortunately.

Is there a way to handle OOo 1.x documents with odt2txt or could it be 
fixed to do so.


Are there alternatives for processing/extracting text from OOo 1.x 
documents?


I CCed the OOo and odt2txt maintainers. Hopefully they can share their 
insight on this.


Cheers,
Michael


--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-04-26 Thread Michael Biebl

severity 478091 important
thanks

David wrote:

Package: tracker
Version: 0.6.6-1+b1
Severity: serious

--- Please enter the report below this line. ---

tracker uses o3read to extract the text of OpenOffice documents. In fact,
o3read is in the recommendation list of tracker.

Nevertheless, o3read has been dropped from the repository in favour of
odt2txt: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=477311(furthermore,
o3read does not work with the new OpenOffice format).



Thanks for the notice. I'll update the dependencies accordingly.
As o3read is only a Recommends though, severity serious is not justified 
 and so I'm downgrading to imortant.


Cheers,
Michael


--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#478091: [tracker] Use odt2txt instead of the removed o3read

2008-04-26 Thread David
Package: tracker
Version: 0.6.6-1+b1
Severity: serious

--- Please enter the report below this line. ---

tracker uses o3read to extract the text of OpenOffice documents. In fact,
o3read is in the recommendation list of tracker.

Nevertheless, o3read has been dropped from the repository in favour of
odt2txt: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=477311(furthermore,
o3read does not work with the new OpenOffice format).

--- System information. ---
Architecture: i386
Kernel: Linux 2.6.24-1-686

Debian Release: lenny/sid
990 unstable www.debian-multimedia.org
990 unstable ftp.uk.debian.org
500 stable dl.google.com
500 experimental www.debian-multimedia.org
1 experimental ftp.uk.debian.org

--- Package information. ---
Depends (Version) | Installed
-+-==
dbus | 1.2.1-1
libc6 (>= 2.7-1) | 2.7-10
libcairo2 | 1.6.4-1+b1
libdbus-1-3 (>= 1.0.2) | 1.2.1-1
libdbus-glib-1-2 (>= 0.74) | 0.74-2
libexempi3 | 2.0.0-1
libexif12 | 0.6.16-2.1
libglib2.0-0 (>= 2.16.0) | 2.16.3-2
libgmime-2.0-2a | 2.2.18-2
libgsf-1-114 (>= 1.14.8) | 1.14.8-1
libgstreamer0.10-0 (>= 0.10.9) | 0.10.19-3
libgtk2.0-0 (>= 2.12.0) | 2.12.9-3
libhal1 (>= 0.5.8.1) | 0.5.11~rc2-1
libpango1.0-0 (>= 1.20.2) | 1.20.2-2
libpng12-0 (>= 1.2.13-4) | 1.2.26-1
libpoppler-glib2 (>= 0.6) | 0.6.4-1
libqdbm14 (>= 1.8.74) | 1.8.74-1.1
libsqlite3-0 (>= 3.5.7) | 3.5.7-2
libunac1 | 1.8.0-2
libxml2 (>= 2.6.27) | 2.6.32.dfsg-2
shared-mime-info | 0.23-5
zlib1g (>= 1:1.1.4) | 1:1.2.3.3.dfsg-12