Re: Converters and encodings

2007-01-24 Thread Andre Poenitz
On Mon, Jan 22, 2007 at 12:06:06AM +0100, Enrico Forestieri wrote:
> > In the past we were moving to absolute paths
> > wherever possible in order to get rid of the Path class that is simply too
> > confusing to use.
> 
> Really? I find it quite clever ;-)

I'd be more likely to agree if it was named 'TempPathChanger' or such...

Andre'


Re: Converters and encodings

2007-01-24 Thread Abdelrazak Younes

Georg Baum wrote:

Michael Gerz wrote:


Are there any known showstopper ATM? I intend to have a second look at
the plain text output but this can wait (marginal changes).


For the final release we need to have a look at CJK, since the old hacks no
longer work (see http://bugzilla.lyx.org/show_bug.cgi?id=3043). This means
adding languages japanese, korean, chinese and appropriate encodings (file
format change).


Arabic support is a problem also.


IMO we do not need to put this in the first beta (I guess there will be more
than one), since the changes will probably be localized.


I agree.

Abdel.



Re: Converters and encodings

2007-01-24 Thread Georg Baum
Michael Gerz wrote:

> Are there any known showstopper ATM? I intend to have a second look at
> the plain text output but this can wait (marginal changes).

For the final release we need to have a look at CJK, since the old hacks no
longer work (see http://bugzilla.lyx.org/show_bug.cgi?id=3043). This means
adding languages japanese, korean, chinese and appropriate encodings (file
format change).
IMO we do not need to put this in the first beta (I guess there will be more
than one), since the changes will probably be localized.


Georg



Re: Converters and encodings

2007-01-23 Thread José Matos
On Tuesday 23 January 2007 9:37:24 pm Michael Gerz wrote:
> Before the beta, we should have another look at Status.15x. Those bugs
> that haven't been fixed yet should be added to bugzilla, and Status.15x
> should be removed.

  I agree.

> We should also decide on the LyX installer before the beta (depends on
> the availability of Joost or the willingness of others to jump in).

  rpm works for me. ;-)
  I agree.

> Are there any known showstopper ATM? I intend to have a second look at
> the plain text output but this can wait (marginal changes).

  CT is a showstopper, plain text export is not. ;-)

> Do you intend to release more than one beta release?

  I would like to decide that after reviewing the reaction from the code. We 
know, as Enrico stated above, that there are some encoding problems that need 
to be solved. Only a wider spread use will show some of them and their 
extent...

> Michael

-- 
José Abílio


Re: Converters and encodings

2007-01-23 Thread Michael Gerz

José Matos schrieb:

On Tuesday 23 January 2007 7:30:10 pm Michael Gerz wrote:
  

Jose, please wait for a day or two. I have some important CT patches.



  I am searching for feedback, so no problem, I will wait. :-)

  I just want to give a subtle sign that we should prepare for a beta. :-)

  


Before the beta, we should have another look at Status.15x. Those bugs 
that haven't been fixed yet should be added to bugzilla, and Status.15x 
should be removed.


We should also decide on the LyX installer before the beta (depends on 
the availability of Joost or the willingness of others to jump in).


Are there any known showstopper ATM? I intend to have a second look at 
the plain text output but this can wait (marginal changes).


Do you intend to release more than one beta release?

Michael


Re: Converters and encodings

2007-01-23 Thread José Matos
On Tuesday 23 January 2007 7:30:10 pm Michael Gerz wrote:
>
> Jose, please wait for a day or two. I have some important CT patches.

  I am searching for feedback, so no problem, I will wait. :-)

  I just want to give a subtle sign that we should prepare for a beta. :-)

> Michael



-- 
José Abílio


Re: Converters and encodings

2007-01-23 Thread Michael Gerz

Enrico Forestieri schrieb:
  [Personal note: Trying to understand what needs to be done before declaring 
a freeze period before beta release.]



I hope this is not to late ;-)
  


Jose, please wait for a day or two. I have some important CT patches.

Michael


Re: Converters and encodings

2007-01-23 Thread Enrico Forestieri
On Tue, Jan 23, 2007 at 05:47:14PM +, José Matos wrote:

> On Sunday 21 January 2007 8:05:24 pm Enrico Forestieri wrote:
> > Currently, a converter path cannot contain non-ascii characters on systems
> > not using utf8 as the local encoding. Moreover, arguments are also passed
> > in utf8 encoding. This is clearly wrong and produces assertions, eg. on
> > Windows but also on Solaris when using a locale different from utf8.
> >
> > I had a look at the code and come to the conclusion that the FileName
> > class with its toFilesystemEncoding method is not of help here. This
> > is because we have to also deal with non-absolute paths and some
> > arguments may also be in utf8.
> >
> > IMO we should have functions for directly converting from utf8 to the
> > file system encoding and vice versa. I would like to hear opinions about
> > this issue and, for discussion, attach here a patch solving these problems.
> 
>   Georg and Enrico do you have any verdict on this patch?

Following the discussion with Georg, I committed this patch:
http://www.lyx.org/trac/changeset/16803

>   It seems to fix a real bug, what is missing?

I think that we are progressively smashing encoding issues, but I fear
that there may be other cases to be yet discovered.

>   [Personal note: Trying to understand what needs to be done before declaring 
> a freeze period before beta release.]

I hope this is not to late ;-)

-- 
Enrico


Re: Converters and encodings

2007-01-23 Thread José Matos
On Sunday 21 January 2007 8:05:24 pm Enrico Forestieri wrote:
> Currently, a converter path cannot contain non-ascii characters on systems
> not using utf8 as the local encoding. Moreover, arguments are also passed
> in utf8 encoding. This is clearly wrong and produces assertions, eg. on
> Windows but also on Solaris when using a locale different from utf8.
>
> I had a look at the code and come to the conclusion that the FileName
> class with its toFilesystemEncoding method is not of help here. This
> is because we have to also deal with non-absolute paths and some
> arguments may also be in utf8.
>
> IMO we should have functions for directly converting from utf8 to the
> file system encoding and vice versa. I would like to hear opinions about
> this issue and, for discussion, attach here a patch solving these problems.

  Georg and Enrico do you have any verdict on this patch?

  It seems to fix a real bug, what is missing?

  [Personal note: Trying to understand what needs to be done before declaring 
a freeze period before beta release.]

-- 
José Abílio


Re: Converters and encodings

2007-01-22 Thread Enrico Forestieri
On Mon, Jan 22, 2007 at 03:44:44PM +0100, Georg Baum wrote:

> Enrico Forestieri wrote:
> 
> > On Sun, Jan 21, 2007 at 10:41:29PM +0100, Georg Baum wrote:
> >> In the past we were moving to absolute paths
> >> wherever possible in order to get rid of the Path class that is simply
> >> too confusing to use.
> > 
> > Really? I find it quite clever ;-)
> 
> Yes, after you understand how it works. But before you don't understand code
> like this:
> 
> if (conv.original_dir) {
> Path p(buffer->filePath());
> res = one.startscript(type, command);
> } else
> res = one.startscript(type, command);
> 
> Why not
> 
> if (conv.original_dir)
> Path p(buffer->filePath());
> res = one.startscript(type, command);
> 
> ?

I see. But the scope concept should be a fundamental one to understand,
and I think that the comment in path.h is quite clear.

-- 
Enrico


Re: Converters and encodings

2007-01-22 Thread Georg Baum
Enrico Forestieri wrote:

> On Sun, Jan 21, 2007 at 10:41:29PM +0100, Georg Baum wrote:
>> Why non-absolute paths?
> 
> There's a comment in converter.C about the fact that some converters can
> only output files to the current directory, so makeRelPath is used with
> first and second arguments as utf8 encoded paths. Whereas the
> first argument can be easily converted to the file system encoding (it's
> a FileName), the second one is the path encoded as an utf8 string.

OK, I remember.

>> In the past we were moving to absolute paths
>> wherever possible in order to get rid of the Path class that is simply
>> too confusing to use.
> 
> Really? I find it quite clever ;-)

Yes, after you understand how it works. But before you don't understand code
like this:

if (conv.original_dir) {
Path p(buffer->filePath());
res = one.startscript(type, command);
} else
res = one.startscript(type, command);

Why not

if (conv.original_dir)
Path p(buffer->filePath());
res = one.startscript(type, command);

?
 
>> And why would some arguments be in utf8? AFAIK commandline arguments
>> should always be in the current locale.
> 
> I am talking about the parameters that are added to the command line to
> be executed. Due to lack of time, I was not able to ascertain how they
> are stored in the Converter class. AFAIK they could be in utf8 encoding.

Yes, they can be: If somebody enters an argument in the preferences file.
AFAIK LyX itself does only create ASCII commands. I thought you had
something specific in mind.

> Yes, I thought about implementing to_filesystem8bit, but then realized
> that I would have not used it by alone but always in conjunction with
> from_utf8. Introducing to_filesystem8bit, only converter.C and the two
> python scripts should be changed. I'll try to do it.

Good. I'll have a look at th string->docstring stuff maybe next week.

> BTW, I never use anything but ascii paths ;-)

Me too. But our users are resistant to any education on this issue :-)


Georg



Re: Converters and encodings

2007-01-21 Thread Enrico Forestieri
On Sun, Jan 21, 2007 at 10:41:29PM +0100, Georg Baum wrote:

> Enrico Forestieri wrote:

> > I had a look at the code and come to the conclusion that the FileName
> > class with its toFilesystemEncoding method is not of help here. This
> > is because we have to also deal with non-absolute paths and some
> > arguments may also be in utf8.
> 
> Why non-absolute paths?

There's a comment in converter.C about the fact that some converters can
only output files to the current directory, so makeRelPath is used with
first and second arguments as utf8 encoded paths. Whereas the
first argument can be easily converted to the file system encoding (it's
a FileName), the second one is the path encoded as an utf8 string.

> In the past we were moving to absolute paths
> wherever possible in order to get rid of the Path class that is simply too
> confusing to use.

Really? I find it quite clever ;-)

> And why would some arguments be in utf8? AFAIK commandline arguments should
> always be in the current locale.

I am talking about the parameters that are added to the command line to
be executed. Due to lack of time, I was not able to ascertain how they
are stored in the Converter class. AFAIK they could be in utf8 encoding.

> > IMO we should have functions for directly converting from utf8 to the
> > file system encoding and vice versa. I would like to hear opinions about
> > this issue and, for discussion, attach here a patch solving these
> > problems.
> 
> As I already wrote to Abdel last week the fact that all our filename related
> code does still use std::string and not docstring is only of temporary
> nature. I did not convert it yet because that would mean a lot of
> conversions e.g. in some bibtex related code that cannot be converted to
> docstring easily.

Oh, that's it. I was wondering why you chose to go through those double
conversions...

> Therefore I don't think that we should convert from/to utf8 directly. If a
> function
> 
> std::string const to_filesystem8bit(docstring const & s);
> 
> is needed, then add it, but instead of direct conversion from/to utf8 I
> would rather change all filename related code to docstring. I can do that
> if you want, but not too soon (the next week will be busy).

Yes, I thought about implementing to_filesystem8bit, but then realized that
I would have not used it by alone but always in conjunction with from_utf8.
Introducing to_filesystem8bit, only converter.C and the two python scripts
should be changed. I'll try to do it.
BTW, I never use anything but ascii paths ;-)

-- 
Enrico


Re: Converters and encodings

2007-01-21 Thread Georg Baum
Enrico Forestieri wrote:

> Currently, a converter path cannot contain non-ascii characters on systems
> not using utf8 as the local encoding. Moreover, arguments are also passed
> in utf8 encoding. This is clearly wrong and produces assertions, eg. on
> Windows but also on Solaris when using a locale different from utf8.

This is wrong indeed. I thought that I converted all filenames given as
commandline arguments, but obviously I missed some.

> I had a look at the code and come to the conclusion that the FileName
> class with its toFilesystemEncoding method is not of help here. This
> is because we have to also deal with non-absolute paths and some
> arguments may also be in utf8.

Why non-absolute paths? In the past we were moving to absolute paths
wherever possible in order to get rid of the Path class that is simply too
confusing to use.
And why would some arguments be in utf8? AFAIK commandline arguments should
always be in the current locale.

> IMO we should have functions for directly converting from utf8 to the
> file system encoding and vice versa. I would like to hear opinions about
> this issue and, for discussion, attach here a patch solving these
> problems.

As I already wrote to Abdel last week the fact that all our filename related
code does still use std::string and not docstring is only of temporary
nature. I did not convert it yet because that would mean a lot of
conversions e.g. in some bibtex related code that cannot be converted to
docstring easily.
Therefore I don't think that we should convert from/to utf8 directly. If a
function

std::string const to_filesystem8bit(docstring const & s);

is needed, then add it, but instead of direct conversion from/to utf8 I
would rather change all filename related code to docstring. I can do that
if you want, but not too soon (the next week will be busy).


Georg