>>>>> "CG" == Christian Grün <christian.gr...@gmail.com> writes:
CG> Jidanni,

>> echo '<A>你好</A>'|perl -pwle 's![^[:ascii:]]!$&<wbr/>!'|basex -q '
>> declare option db:parser "html";
>> declare option output:method "raw";
>> doc("/dev/stdin")//*:wbr/..'

CG> If you want help, please try to help, too. Your example is not what I
CG> would call very helpful; give us at least:

CG>   a) a minimized example,

That's what it is, totally contained. Just run it on your Linux etc.
shell command line.

CG>   b) the returned output, and

OK, here it is QP encoded:
=EF=BF=BD=EF=BF=BD=EF=BF=BD=E5=A5=BD=

CG>   c) the expected result

I'm just trying to find a way to remove the <wbr/> injected here,
$ echo '<A>你好</A>'|perl -pwle 's![^[:ascii:]]!$&<wbr/>!'|qprint -e
<A>=E4<wbr/>=BD=A0=E5=A5=BD</A>

So I can get
<A>=E4=BD=A0=E5=A5=BD</A>

I am guessing that is not possible with Basex, and one needs byte level
tools like perl.

>> declare option output:encoding "RAW"; or "BYTES" or "NONE"

CG> I’m not sure if you will need any output declaration for your query at
CG> all; but we first need more details.

>> http://docs.basex.org/wiki/Serialization
>> it just says
>> "all encodings supported by Java"
>> So one is supposed to look at
>> http://www.google.com/search?q=all+encodings+supported+by+Java

CG> I've added a link. Note, however, that the list is also dependent on
CG> the Java VM you are using.

OK, also do make a note of that fact there...

>> Why doesn't basex have a command that would output the current
>> "all encodings supported by Java"
>> that it is using.

CG> Try this:

CG>   basex "Q{java.nio.charset.Charset}availableCharsets()"

Gawd!
$ basex "Q{java.nio.charset.Charset}availableCharsets()"|wc
      0     167    3593
One big line and everything is repeated twice!

$ basex "Q{java.nio.charset.Charset}availableCharsets()"|
  perl -nwle 'print for /([^\s{]+)=/g'|wc
    167     167    1713
looks much nicer and has half the bytes.

Do make a note of it on the wiki there. Thanks.
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to