Benetl, a free ETL tool for MySQL, out in version 4.1

2012-09-30 Thread Benoît Carpentier

Dear all,

Benetl, a free ETL tool for MySQL, is out in version 4.1.

This version brings GUI improvement: control of field names and table 
(EtlTool) name.
With this version, it is now possible to use  and  with if operator 
on a part formula.

This version brings some code correction and improvement.

Benetl is freely dowloadable at:http://www.benetl.net

You can learn more about ETL tools at: 
http://en.wikipedia.org/wiki/Extract,_transform,_load


Thanks for your interest.

Regards,

--
Benoît Carpentier
http://www.benetl.net
Founder of Benetl and Java project manager



Re: Need Help Converting Character Sets

2012-09-30 Thread Mark Phillips
Thanks to all of you for your very helpful suggestions! I was pulled off on
a different project for a few days, and I am now just getting back to this
one. Sorry for my absence.

First, some more information:

My table definition:
CREATE TABLE `Articles` (
`articleID` int(11) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(200) NOT NULL DEFAULT '',
`author` varchar(200) NOT NULL DEFAULT '',
`body` text NOT NULL,
`intro` text NOT NULL,
`caption` text NOT NULL,
`credits` text NOT NULL,
`articleDate` date NOT NULL DEFAULT '-00-00',
`imageTitle` varchar(255) NOT NULL DEFAULT '',
`imageAltText` varchar(255) NOT NULL DEFAULT '',
`imageWidth` float NOT NULL DEFAULT '1',
`imageHeight` float NOT NULL DEFAULT '1',
`imageFile` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`articleID`),
FULLTEXT KEY `search1` (`title`,`author`,`body`,`caption`,`credits`)
) ENGINE=MyISAM AUTO_INCREMENT=89 DEFAULT CHARSET=latin1

First problem - CHARSET should be utf8.

The data for this table comes from a web page (charet utf8). I copy/paste
word files into gedit (on linux) and then copy/paste from gedit to a text
boxes on the web page input form. I had thought I was stripping out all the
funky characters by using a simple ascii editor like gedit, but obviously
not.

After looking at the mysqldump for the table in a hex editor, I discovered
I have these characters scatter throughout the body and intro columns:
#8220;
#8221;
#8217;
#8212;
#8230;
#8617;

I tried converting the columns into utf8 with the command
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8;
but all the special characters are still there.

I tried converting to blob and back to utf8, and that didn't change
anything. (I had to first drop the fulltext key to convert to blob).
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
The above special characters are still in the body column when I view the
dump file with a hex editor.

Is there a way to replace these special characters with the appropriate
utf8 characters (eg #8220; - 22 hex) within the text columns body and
intro columns?

Thanks,

Mark

On Fri, Sep 28, 2012 at 7:59 AM, Rick James rja...@yahoo-inc.com wrote:

 Thanks for that link!  That's another subtle issue I had not noted.

 There are so many combinations, that it is hard to say do this:
 * Incoming bytes are latin1 / utf8 / Microsquish control characters.
 * You do/don't have SET NAMES (or equivalent)
 * The database/table/column is declared latin1/utf8/other.
 * The problem is on ingestion / on retrieval.

 The thing mentioned involved 2 steps:
 ALTER TABLE ... MODIFY COLUMN  BINARY (or BLOB);  -- to forget any charset
 knowledge
 ALTER TABLE ... MODIFY COLUMN  CHARACTER SET ...;  -- coming from BINARY,
 this does not check the encoding.
 (sorry, don't have the link handy)

  -Original Message-
  From: h...@tbbs.net [mailto:h...@tbbs.net]
  Sent: Thursday, September 27, 2012 2:24 PM
  To: Mark Phillips
  Cc: Mysql List
  Subject: Re: Need Help Converting Character Sets
 
   2012/09/24 16:28 -0700, Mark Phillips 
  I have a table, Articles, of news articles (in English) with three text
  columns for the intro, body, and caption. The data came from a web
  page, and the content was cut and pasted from other sources. I am
  finding that there are some non utf-8 characters in these three text
  columns. I would like to (1) convert these text fields to be strict
  utf-8 and then (2) fix the input page to keep all new submissions utf-
  8.
 
  91) For the first step, fixing the current database, I tried:
 
  update Articles set body = CONVERT(body USING ASCII);
 
  However, when I checked one of the articles I found an apostrophe had
  been converted into a question mark. (FWIW, the apostrophe was one of
  those offending non utf-8 characters):
 
  Before conversion: I stepped into the observatory?s control room ...
 
  After conversion: I stepped into the observatory?s control room...
 
  Is there a better way to accomplish my first goal, without reading each
  article and manually making the changes?
  
  I do not remember where on the MySQL website this is, but there was an
  article about converting from character sets in version 4 to those in
  version 5, when UTF-8 first was supported. It sounds to me that maybe
  the tricks shown there would be useful to you, since, in effect,
  through MySQL MySQL was fooled into accepting for UTF-8 that which was
  not. Conversion to binary string was mentioned.
 
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:http://lists.mysql.com/mysql