Re: [dev] Re: Import of relative links from .doc files.

2010-09-14 Thread Caolán McNamara
On Tue, 2010-09-14 at 16:58 +0200, Michael Stahl wrote:
> maybe it would make sense to move the property hyperlink blob decoding to
> sfx2 as well (if the format for xls/ppt/doc is the same, about which i
> have no idea).

Yeah, poking at it, they parsing/decoding should unified in there seeing
as at it currently stands the DocumentSummaryInformation stream is
getting parsed twice for .ppt files.

Given the current information about the hyperlink foo and .docs though,
it would be important *not* to directly round trip the hyperlink blob in
an import / export cycle seeing as modified hyperlinks in a .doc would
apparently be ignored in favour of round-tripped DSI entries.

C.


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
For additional commands, e-mail: dev-h...@openoffice.org



Re: [dev] Re: Import of relative links from .doc files.

2010-09-14 Thread Caolán McNamara
On Tue, 2010-09-14 at 15:03 +0200, Knut Olav Bøhmer wrote:
> Are sfx2 reading it's parts before the .doc importer is read, and could OOo
> be set to read the relevant parts when sfx2 does it's work, and then save it
> for later?

sfx2 reads different, more standardized parts. 

Directly pulling out these odd things in say
SwWW8ImplReader::ReadDocInfo where pStg is a pointer to the same type of
object as mrStorage is a reference to at
http://svn.services.openoffice.org/opengrok/xref/Current%20%28trunk%
29/sd/source/filter/ppt/pptin.cxx#275
is probably a better way to go, stuff the info into some table belonging
to SwWW8ImplReader and then in Read_F_Hyperlink try to lookup the
matching entry in that table for the specific field index to get a
replacement hyperlink if an entry exists.

C.


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
For additional commands, e-mail: dev-h...@openoffice.org



Re: [dev] Re: Import of relative links from .doc files.

2010-09-14 Thread Knut Olav Bøhmer
On 14 September 2010 14:58, Caolán McNamara  wrote:

> On Tue, 2010-09-14 at 14:40 +0200, Knut Olav Břhmer wrote:
> > Thank you for the analysis :) this really helps.
> > What parts of the word document is available at the time of importing the
> > hyperlink. Is all of the parts read in, so that the DocumentSummaryStream
> > could be accessed at that time?
>
> It's not read in by the .doc importer currently. Some generic parts are
> read in by sfx2 to set generic properties, but there's something in
> sd/source/filter/ppt/pptin.cxx (search for _PID_HLINKS) which looks like
> roughtly the same kind of thing for the ppt format. That would also need
> to be done in the ww8 importer to parse them out and attempt the (weird
> and wonderful) mapping from dwApp to field code mentioned in the
> "Application Data for VtHyperlink" section of the .doc spec.


Are sfx2 reading it's parts before the .doc importer is read, and could OOo
be set to read the relevant parts when sfx2 does it's work, and then save it
for later?


-- 
Knut Olav Bøhmer


Re: [dev] Re: Import of relative links from .doc files.

2010-09-14 Thread Caolán McNamara
On Tue, 2010-09-14 at 14:40 +0200, Knut Olav Bøhmer wrote:
> Thank you for the analysis :) this really helps.
> What parts of the word document is available at the time of importing the
> hyperlink. Is all of the parts read in, so that the DocumentSummaryStream
> could be accessed at that time?

It's not read in by the .doc importer currently. Some generic parts are
read in by sfx2 to set generic properties, but there's something in
sd/source/filter/ppt/pptin.cxx (search for _PID_HLINKS) which looks like
roughtly the same kind of thing for the ppt format. That would also need
to be done in the ww8 importer to parse them out and attempt the (weird
and wonderful) mapping from dwApp to field code mentioned in the
"Application Data for VtHyperlink" section of the .doc spec.

C.


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
For additional commands, e-mail: dev-h...@openoffice.org



Re: [dev] Re: Import of relative links from .doc files.

2010-09-14 Thread Knut Olav Bøhmer
On 14 September 2010 14:26, Caolán McNamara  wrote:

> On Tue, 2010-09-14 at 12:54 +0200, Knut Olav Břhmer wrote:
> > Hi,
> >
> >
> > Attatched to this email is a document with a hyperlink that opens
> > differently in MS Office and OpenOffi
>
> In word if you open the .doc and toggle field codes on you can see that
> the relative url is recorded as "..\\..\\FELLES.PROSESSER.doc#k11" in
> the field code itself which is the URL that OOo is taking and converting
> to an absolute URL relative to the location that the imported .doc is
> loaded from. (Not sure why we are turning it from relative to absolute
> in the .doc importer, sounds like a bad idea, but that's neither here
> not there in this case)
>
> But it looks like word *isn't* taking the field code itself as the url
> to display/use in this case, but instead using the value tucked away in
> the DocumentSummaryStream in the VtHyperlink sequence (I think), so it
> sort of looks like the algorithm for hyperlinks need to be to import the
> DocumentSummaryStream links and match them up with the fields they are
> associated with and use those links in favour of the inlines ones where
> they exist.
>
> A big of a nightmare, not sure how its possible to get them out of sync
> with each other.


Thank you for the analysis :) this really helps.
What parts of the word document is available at the time of importing the
hyperlink. Is all of the parts read in, so that the DocumentSummaryStream
could be accessed at that time?



Best regards
-- 
Knut Olav Bøhmer


Re: [dev] Re: Import of relative links from .doc files.

2010-09-14 Thread Caolán McNamara
On Tue, 2010-09-14 at 12:54 +0200, Knut Olav Bøhmer wrote:
> Hi,
> 
> 
> Attatched to this email is a document with a hyperlink that opens
> differently in MS Office and OpenOffi

In word if you open the .doc and toggle field codes on you can see that
the relative url is recorded as "..\\..\\FELLES.PROSESSER.doc#k11" in
the field code itself which is the URL that OOo is taking and converting
to an absolute URL relative to the location that the imported .doc is
loaded from. (Not sure why we are turning it from relative to absolute
in the .doc importer, sounds like a bad idea, but that's neither here
not there in this case)

But it looks like word *isn't* taking the field code itself as the url
to display/use in this case, but instead using the value tucked away in
the DocumentSummaryStream in the VtHyperlink sequence (I think), so it
sort of looks like the algorithm for hyperlinks need to be to import the
DocumentSummaryStream links and match them up with the fields they are
associated with and use those links in favour of the inlines ones where
they exist.

A big of a nightmare, not sure how its possible to get them out of sync
with each other.

C.


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
For additional commands, e-mail: dev-h...@openoffice.org



Re: [dev] Re: Import of relative links from .doc files.

2010-09-14 Thread Knut Olav Bøhmer
Hi,

Attatched to this email is a document with a hyperlink that opens
differently in MS Office and OpenOffice.org.

Are anyone able to take a look at it?

2010/9/13 Knut Olav Bøhmer 

>
> I will upload a test document tomorrow. I don't have any documents here
> without sensitive data in it, and I don't have word, so I need the owner to
> produce a document without data (only the buggy data) in it.
>
> I will post the update here tomorrow.
>
> http://qa.openoffice.org/issues/show_bug.cgi?id=114485
>
> On 13 September 2010 22:00, Caolán McNamara  wrote:
>
>> And as an aside, the hyperlink importer for the .doc format is in
>> sw/source/filter/ww8 as SwWW8ImplReader::Read_F_Hyperlink and its a
>> fairly trivial thing
>>
>> C.
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
>> For additional commands, e-mail: dev-h...@openoffice.org
>>
>>
>
>
> --
> Knut Olav Bøhmer
>



-- 
Knut Olav Bøhmer
-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
For additional commands, e-mail: dev-h...@openoffice.org

Re: [dev] Re: Import of relative links from .doc files.

2010-09-13 Thread Knut Olav Bøhmer
I will upload a test document tomorrow. I don't have any documents here
without sensitive data in it, and I don't have word, so I need the owner to
produce a document without data (only the buggy data) in it.

I will post the update here tomorrow.

http://qa.openoffice.org/issues/show_bug.cgi?id=114485

On 13 September 2010 22:00, Caolán McNamara  wrote:

> And as an aside, the hyperlink importer for the .doc format is in
> sw/source/filter/ww8 as SwWW8ImplReader::Read_F_Hyperlink and its a
> fairly trivial thing
>
> C.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
> For additional commands, e-mail: dev-h...@openoffice.org
>
>


-- 
Knut Olav Bøhmer


Re: [dev] Re: Import of relative links from .doc files.

2010-09-13 Thread Caolán McNamara
And as an aside, the hyperlink importer for the .doc format is in
sw/source/filter/ww8 as SwWW8ImplReader::Read_F_Hyperlink and its a
fairly trivial thing

C.


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
For additional commands, e-mail: dev-h...@openoffice.org



Re: [dev] Re: Import of relative links from .doc files.

2010-09-13 Thread Knut Olav Bøhmer
Thanks to freenode I got
gsf dump wordfile.doc WordDocument
and a push in direction of google ;)


2010/9/13 Knut Olav Bøhmer 

> Hi,
>
> I would like to see the specification of the .doc format.
> The bug is not in the conversion from relative to absolute. The bug is
> somewhere in the reading process. Already in th
> SwWW8'implReader:::Read_Field, the relative url shows up with part of it
> missing.
>
> It shows up as ..\..\foo.doc when it should have been ..\..\bar\foo.doc
>
> Is it possible to look at the strings in the word doc some how with
> hex/asci reader?
>
>
> On 13 September 2010 11:36, Michael Stahl wrote:
>
>> On 12/09/2010 19:24, Knut Olav Břhmer wrote:
>> > Thank you. Do you know if there is somewhere else in the code that the
>> bug
>> > could be?
>> > As I understand there is some issues about the way openoffice.org keeps
>> the
>> > url represented in memory.
>> > Do you know anything about that?
>>
>> hi Knut,
>>
>> OOo will convert all URLs from relative to absolute on import.
>> usually on export URLs are converted to relative if possible, but i think
>> that is subject to some configuration settings.
>>
>> of course it could be possible that not every filter uses the right flags
>> when converting URLs...
>>
>> --
>> "Portability is for canoes." -- Jim McCarthy, Microsoft Corporation
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
>> For additional commands, e-mail: dev-h...@openoffice.org
>>
>>
>
>
> --
> Knut Olav Bøhmer
>



-- 
Knut Olav Bøhmer


Re: [dev] Re: Import of relative links from .doc files.

2010-09-13 Thread Caolán McNamara
On Mon, 2010-09-13 at 21:35 +0200, Knut Olav Bøhmer wrote:
> Is it possible to look at the strings in the word doc some how with hex/asci
> reader?

If you have libgsf (e.g. on Linux) you can look at the WordDocument
stream (which has the text in it and the fields are in the text with
e.g.

gsf dump test.doc WordDocument|less

main document text typically starts around 0x400 or later in relatively
recent word formats.

On windows, there's some graphical ole2 viewer thing whose name I can't
remember that you could use as well, a search for "ole compound format
viewer" might throw it up.

Disable fastsave in word if you are trying to cut a small document down
to a small test-case as fastsave complicates matters somewhat.

C.



-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
For additional commands, e-mail: dev-h...@openoffice.org



Re: [dev] Re: Import of relative links from .doc files.

2010-09-13 Thread Knut Olav Bøhmer
Hi,

I would like to see the specification of the .doc format.
The bug is not in the conversion from relative to absolute. The bug is
somewhere in the reading process. Already in th
SwWW8'implReader:::Read_Field, the relative url shows up with part of it
missing.

It shows up as ..\..\foo.doc when it should have been ..\..\bar\foo.doc

Is it possible to look at the strings in the word doc some how with hex/asci
reader?


On 13 September 2010 11:36, Michael Stahl wrote:

> On 12/09/2010 19:24, Knut Olav Břhmer wrote:
> > Thank you. Do you know if there is somewhere else in the code that the
> bug
> > could be?
> > As I understand there is some issues about the way openoffice.org keeps
> the
> > url represented in memory.
> > Do you know anything about that?
>
> hi Knut,
>
> OOo will convert all URLs from relative to absolute on import.
> usually on export URLs are converted to relative if possible, but i think
> that is subject to some configuration settings.
>
> of course it could be possible that not every filter uses the right flags
> when converting URLs...
>
> --
> "Portability is for canoes." -- Jim McCarthy, Microsoft Corporation
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.org
> For additional commands, e-mail: dev-h...@openoffice.org
>
>


-- 
Knut Olav Bøhmer