Re: [Rdkit-discuss] Export pandas DataFrame to xlsx with molecule images

2014-11-04 Thread Grégori Gerebtzoff

Hi Samo,

I don't think you need temporary files at all;
have you tried something like this?

from cStringIO import StringIO
image_file = StringIO()
img = Draw.MolToImage(row[molCol], size=size)
img.save(image_file, format='PNG')
image_data = image_file.getvalue()

Grégori


On 03. 11. 14 21:44, Samo Turk wrote:

Hi Grégori,

Thanks for pointing this out. I modified the code and now it writes 
only one temporary file.


Cheers,
Samo

On Fri, Oct 31, 2014 at 10:56 AM, Grégori Gerebtzoff 
greg...@gerebtzoff.com mailto:greg...@gerebtzoff.com wrote:


Hi Samo,

I used a few years ago the PHPExcel library to put images into an
Excel file, and it was not necessary to use physical files.
Having a quick look at the library I found this class (probably
the one I used): PHPExcel_Worksheet_MemoryDrawing (source code:

https://github.com/clariondoor/PHPExcel/blob/master/Worksheet/MemoryDrawing.php)
The interesting bit:
public function __construct()
{
// Initialise values
$this-_imageResource= null;
$this-_renderingFunction = self::RENDERING_DEFAULT;
$this-_mimeType= self::MIMETYPE_DEFAULT;
$this-_uniqueName= md5(rand(0, ). time()
. rand(0, ));

// Initialize parent
parent::__construct();
}

Thus I'm pretty sure you can use the same trick in python
XlsxWriter (have a look at the _add_image_files function in
packager.py), using a random file name and a bit stream to the
image, as described here:

http://xlsxwriter.readthedocs.org/en/latest/example_images_bytesio.html#ex-images-bytesio:

filename   = 'python.png'
image_file = open(filename, 'rb')
image_data = BytesIO(image_file.read())
image_file.close()
# Write the byte stream image to a cell. The filename must  be
specified.
worksheet.insert_image('B8', filename, {'image_data': image_data})

At least it's worth a try!
Another trick I had to do both with PHPExcel and in VBA was to set
the width of columns three times to make sure that it was actually
correct. Don't ask me why... Just in case you face some width issues.

Good luck!

Grégori



On 30. 10. 14 16:49, Samo Turk wrote:

Hi rdkiters,

Due to popular demand I started to work on a function to export
pandas DataFrame to xlsx with molecule images embedded.
Because of the xlsx specifics the code is not optimal. The most
annoying thing about this implementation is that it has to write
all images to the hard drive, before it packs them in xlsx (and
deletes them at the end). I checked two python xlsx libraries and
both save images that way. If someone finds better solution,
please share it.

The dimensions of cells with images are not optimal because Excel
is weird. :) From xlsxwriter docs): The width corresponds to the
column width value that is specified in Excel. It is
approximately equal to the length of a string in the default font
of Calibri 11. Unfortunately, there is no way to specify
“AutoFit” for a column in the Excel file format.

It crashes if value of a cell is of wrong type so use
df['value'].astype() to fix incorrectly assigned types.

Resulting files work nicely in Office 365 (standalone and web
app), but for some reason don't work optimally with LibreOffice
(after row ~125 it stacks all images).

I made a pull request on GitHub:
https://github.com/rdkit/rdkit/pull/371
Demo:

http://nbviewer.ipython.org/github/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/XLSX%20export.ipynb
Demo xlsx file:

https://github.com/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/demo.xlsx

Regards,
Samo



--


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net  
mailto:Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
mailto:Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Export pandas DataFrame to xlsx with molecule images

2014-11-04 Thread Samo Turk
Hi Grégori,

I tried exactly what you suggested already yesterday but it didn't work. I
did find a solution after some fiddling and now it works.
https://github.com/rdkit/rdkit/pull/371/files

Thanks!


On Tue, Nov 4, 2014 at 9:32 AM, Grégori Gerebtzoff greg...@gerebtzoff.com
wrote:

  Hi Samo,

 I don't think you need temporary files at all;
 have you tried something like this?

 from cStringIO import StringIO
 image_file = StringIO()
 img = Draw.MolToImage(row[molCol], size=size)
 img.save(image_file, format='PNG')
 image_data = image_file.getvalue()

 Grégori



 On 03. 11. 14 21:44, Samo Turk wrote:

  Hi Grégori,

  Thanks for pointing this out. I modified the code and now it writes only
 one temporary file.

  Cheers,
 Samo

 On Fri, Oct 31, 2014 at 10:56 AM, Grégori Gerebtzoff 
 greg...@gerebtzoff.com wrote:

  Hi Samo,

 I used a few years ago the PHPExcel library to put images into an Excel
 file, and it was not necessary to use physical files.
 Having a quick look at the library I found this class (probably the one I
 used): PHPExcel_Worksheet_MemoryDrawing (source code:
 https://github.com/clariondoor/PHPExcel/blob/master/Worksheet/MemoryDrawing.php
 )
 The interesting bit:
 public function __construct()
 {
 // Initialise values
 $this-_imageResource= null;
 $this-_renderingFunction = self::RENDERING_DEFAULT;
 $this-_mimeType= self::MIMETYPE_DEFAULT;
 $this-_uniqueName= md5(rand(0, ). time() .
 rand(0, ));

 // Initialize parent
 parent::__construct();
 }

 Thus I'm pretty sure you can use the same trick in python XlsxWriter
 (have a look at the _add_image_files function in packager.py), using a
 random file name and a bit stream to the image, as described here:
 http://xlsxwriter.readthedocs.org/en/latest/example_images_bytesio.html#ex-images-bytesio
 :

 filename   = 'python.png'
 image_file = open(filename, 'rb')
 image_data = BytesIO(image_file.read())
 image_file.close()
 # Write the byte stream image to a cell. The filename must  be specified.
 worksheet.insert_image('B8', filename, {'image_data': image_data})

 At least it's worth a try!
 Another trick I had to do both with PHPExcel and in VBA was to set the
 width of columns three times to make sure that it was actually correct.
 Don't ask me why... Just in case you face some width issues.

 Good luck!

 Grégori



 On 30. 10. 14 16:49, Samo Turk wrote:

 Hi rdkiters,

  Due to popular demand I started to work on a function to export pandas
 DataFrame to xlsx with molecule images embedded.
  Because of the xlsx specifics the code is not optimal. The most annoying
 thing about this implementation is that it has to write all images to the
 hard drive, before it packs them in xlsx (and deletes them at the end). I
 checked two python xlsx libraries and both save images that way. If someone
 finds better solution, please share it.

  The dimensions of cells with images are not optimal because Excel is
 weird. :) From xlsxwriter docs): The width corresponds to the column width
 value that is specified in Excel. It is approximately equal to the length
 of a string in the default font of Calibri 11. Unfortunately, there is no
 way to specify “AutoFit” for a column in the Excel file format.

  It crashes if value of a cell is of wrong type so use
 df['value'].astype() to fix incorrectly assigned types.

  Resulting files work nicely in Office 365 (standalone and web app), but
 for some reason don't work optimally with LibreOffice (after row ~125 it
 stacks all images).

   I made a pull request on GitHub:
 https://github.com/rdkit/rdkit/pull/371
  Demo:
 http://nbviewer.ipython.org/github/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/XLSX%20export.ipynb
  Demo xlsx file:
 https://github.com/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/demo.xlsx

  Regards,
 Samo


   
 --



 ___
 Rdkit-discuss mailing 
 listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Export pandas DataFrame to xlsx with molecule images

2014-11-03 Thread Samo Turk
Hi Grégori,

Thanks for pointing this out. I modified the code and now it writes only
one temporary file.

Cheers,
Samo

On Fri, Oct 31, 2014 at 10:56 AM, Grégori Gerebtzoff greg...@gerebtzoff.com
 wrote:

  Hi Samo,

 I used a few years ago the PHPExcel library to put images into an Excel
 file, and it was not necessary to use physical files.
 Having a quick look at the library I found this class (probably the one I
 used): PHPExcel_Worksheet_MemoryDrawing (source code:
 https://github.com/clariondoor/PHPExcel/blob/master/Worksheet/MemoryDrawing.php
 )
 The interesting bit:
 public function __construct()
 {
 // Initialise values
 $this-_imageResource= null;
 $this-_renderingFunction = self::RENDERING_DEFAULT;
 $this-_mimeType= self::MIMETYPE_DEFAULT;
 $this-_uniqueName= md5(rand(0, ). time() .
 rand(0, ));

 // Initialize parent
 parent::__construct();
 }

 Thus I'm pretty sure you can use the same trick in python XlsxWriter (have
 a look at the _add_image_files function in packager.py), using a random
 file name and a bit stream to the image, as described here:
 http://xlsxwriter.readthedocs.org/en/latest/example_images_bytesio.html#ex-images-bytesio
 :

 filename   = 'python.png'
 image_file = open(filename, 'rb')
 image_data = BytesIO(image_file.read())
 image_file.close()
 # Write the byte stream image to a cell. The filename must  be specified.
 worksheet.insert_image('B8', filename, {'image_data': image_data})

 At least it's worth a try!
 Another trick I had to do both with PHPExcel and in VBA was to set the
 width of columns three times to make sure that it was actually correct.
 Don't ask me why... Just in case you face some width issues.

 Good luck!

 Grégori



 On 30. 10. 14 16:49, Samo Turk wrote:

Hi rdkiters,

  Due to popular demand I started to work on a function to export pandas
 DataFrame to xlsx with molecule images embedded.
  Because of the xlsx specifics the code is not optimal. The most annoying
 thing about this implementation is that it has to write all images to the
 hard drive, before it packs them in xlsx (and deletes them at the end). I
 checked two python xlsx libraries and both save images that way. If someone
 finds better solution, please share it.

  The dimensions of cells with images are not optimal because Excel is
 weird. :) From xlsxwriter docs): The width corresponds to the column width
 value that is specified in Excel. It is approximately equal to the length
 of a string in the default font of Calibri 11. Unfortunately, there is no
 way to specify “AutoFit” for a column in the Excel file format.

  It crashes if value of a cell is of wrong type so use
 df['value'].astype() to fix incorrectly assigned types.

  Resulting files work nicely in Office 365 (standalone and web app), but
 for some reason don't work optimally with LibreOffice (after row ~125 it
 stacks all images).

   I made a pull request on GitHub: https://github.com/rdkit/rdkit/pull/371
  Demo:
 http://nbviewer.ipython.org/github/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/XLSX%20export.ipynb
  Demo xlsx file:
 https://github.com/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/demo.xlsx

  Regards,
 Samo


 --



 ___
 Rdkit-discuss mailing 
 listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Export pandas DataFrame to xlsx with molecule images

2014-10-31 Thread Grégori Gerebtzoff

Hi Samo,

I used a few years ago the PHPExcel library to put images into an Excel 
file, and it was not necessary to use physical files.
Having a quick look at the library I found this class (probably the one 
I used): PHPExcel_Worksheet_MemoryDrawing (source code: 
https://github.com/clariondoor/PHPExcel/blob/master/Worksheet/MemoryDrawing.php)

The interesting bit:
public function __construct()
{
// Initialise values
$this-_imageResource= null;
$this-_renderingFunction = self::RENDERING_DEFAULT;
$this-_mimeType= self::MIMETYPE_DEFAULT;
$this-_uniqueName= md5(rand(0, ). time() . 
rand(0, ));


// Initialize parent
parent::__construct();
}

Thus I'm pretty sure you can use the same trick in python XlsxWriter 
(have a look at the _add_image_files function in packager.py), using a 
random file name and a bit stream to the image, as described here: 
http://xlsxwriter.readthedocs.org/en/latest/example_images_bytesio.html#ex-images-bytesio:


filename   = 'python.png'
image_file = open(filename, 'rb')
image_data = BytesIO(image_file.read())
image_file.close()
# Write the byte stream image to a cell. The filename must  be specified.
worksheet.insert_image('B8', filename, {'image_data': image_data})

At least it's worth a try!
Another trick I had to do both with PHPExcel and in VBA was to set the 
width of columns three times to make sure that it was actually correct. 
Don't ask me why... Just in case you face some width issues.


Good luck!

Grégori


On 30. 10. 14 16:49, Samo Turk wrote:

Hi rdkiters,

Due to popular demand I started to work on a function to export pandas 
DataFrame to xlsx with molecule images embedded.
Because of the xlsx specifics the code is not optimal. The most 
annoying thing about this implementation is that it has to write all 
images to the hard drive, before it packs them in xlsx (and deletes 
them at the end). I checked two python xlsx libraries and both save 
images that way. If someone finds better solution, please share it.


The dimensions of cells with images are not optimal because Excel is 
weird. :) From xlsxwriter docs): The width corresponds to the column 
width value that is specified in Excel. It is approximately equal to 
the length of a string in the default font of Calibri 11. 
Unfortunately, there is no way to specify “AutoFit” for a column in 
the Excel file format.


It crashes if value of a cell is of wrong type so use 
df['value'].astype() to fix incorrectly assigned types.


Resulting files work nicely in Office 365 (standalone and web app), 
but for some reason don't work optimally with LibreOffice (after row 
~125 it stacks all images).


I made a pull request on GitHub: https://github.com/rdkit/rdkit/pull/371
Demo: 
http://nbviewer.ipython.org/github/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/XLSX%20export.ipynb
Demo xlsx file: 
https://github.com/Team-SKI/snippets/blob/master/IPython/rdkit_hackaton/demo.xlsx


Regards,
Samo


--


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss