[sqlite] PhD student

2015-02-26 Thread Hadley Wickham
I'd also recommend Paul Murrell's "Introduction to Data Technologies":
https://www.stat.auckland.ac.nz/~paul/ItDT/

Hadley

On Thu, Feb 26, 2015 at 2:11 PM, Jim Callahan
 wrote:
> Books that discuss BOTH R and SQL are a very small subset and assume some
> knowledge of both.
> R INTRODUCTORY BOOKS
> 1. Peter Dalgaard, "Introductory Statistics with R", 2002.
> "The book is based upon a set of notes developed for the course in Basic
> Statistics for Health Researchers at the Faculty of Health Sciences of the
> University of Copenhagen. This course had as its primary target.. students
> for the Ph.D. degree in medicine." Intro page viii.
> body mass index (BMI) and age of menarche.
> 2. Jared Lander, "R for Everyone", 2014.
> More modern, but less focused on health and a little more scattershot.
>
> R AUTHORITATIVE REFERENCE
> 1. Brian Ripley and William Venables, "Modern Applied Statistics with S",
> 2002.
>
> Anything by John Chambers, Robert Gentleman or Brian Ripley or any member
> of the "R Core Development Team" can be considered authoritative (the stuff
> you can footnote without frowns) on R.
>
> Also, if you are going to use the R mailing list read all of the PDFs that
> come with the base installation of R. Its better now, but the R mailing
> list used to have a very strong "RTFM" attitude and did not want to explain
> anything that was clearly covered in the manuals. Especially read the "R
> Import/Export Manual" PDF.
>
> ADVANCED R (with SQL)
> Depends on what you are doing.
> If you working with health surveys,
> Thomas Lumley's "Complex Surveys" is invaluable  One of Lumley's
> examples is the CDC's BRFSS, "The Behavioral Risk Factor Surveillance System
>  (BRFSS) is the world's largest, on-going telephone health survey system."
> (from CDC website). Which in Lumley's example is:
>
>- The BRFSS 2007 data as a HUGE (245Mb) SQLite database
>.
>"
>
> 1. Thomas Lumley, "Complex Surveys: A Guide to Health Analysis Using R",
> http://r-survey.r-forge.r-project.org/svybook/index.html
>
> On the other hand, if you are dealing with biological data such as trying
> to match results from GeneChips with existing reference sources you might
> prefer Robert Gentleman's "R Programming for Bioinformatics" especially,
> Chapter 8 "Data Technologies".
>
> 1. Robert Gentleman's "R Programming for Bioinformatics", 2009.
> "We begin our discussion by describing a range of tools that have been
> implemented in R and that can be used to process and transform data. Next
> we discuss the different interfaces to databases that are available, but
> focus our discussion on SQLite as it is used extensively within the
> Bioconductor Project." page 229
> The databases discussion resumes on page 238, Section 8.4, discusses SQLite
> on page 241 including  a specific example:
> "In the code below we load the SQLite package, initialize a driver and open
> a dataase that has been supplied with the RBionf [R] package that
> accompanies this volume. The database contains a number of tables that map
> between identifers on the Affymetrix HG-U95v2 GeneChip and different
> quantities of interest such as GO categories or PubMed IDs (that map
> published papers that discuss the corresponding genes). We then list the
> tables in that database."
>
> Sometimes we get tired of reading dry tomes and we prefer something more
> chatty and amusing.
>
> For R and other tools I enjoy reading:
>
> Cathy O'Neil's and Rachel Schutt's "Doing Data Science: Straight Talk from
> the Frontline", 2013. It's an O'Reilly book.
>
> For SQLite, I enjoy
> Michael Owen's, "The Definitive Guide to SQLite", 2006. -- maybe not the
> whole book, but the Chapter 4 example page 75 "Foods mentioned in episodes
> of the Seinfield sitcom" is a hoot (and turned out to help me solve an real
> world problem).
>
> If you are doing anything beyond Stats 101 classical statistics it helps to
> understand the Bayesian bogeyman.
>
> A fascinating, non-technical, historical account is provided by Sharon
> Bertsch McGrayne, in her book "The Theory that would not Die...".
>
> BAYESIAN STATISTICS (HISTORY)
> Sharon Bertsch McGrayne,
> "The Theory That Would Not Die
> How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines,
> and Emerged Triumphant from Two Centuries of Controversy"
> , 2011.
> http://yalepress.yale.edu/book.asp?isbn=9780300169690
>
> "For the student who is being exposed to Bayesian statistics for the first
> time, McGrayne?s book provides a wealth of illustrations to whet his or her
> appetite for more. It will broaden and deepen the field of reference of the
> more experienced statistician, and the general reader will find an
> understandable, well-written, and fascinating account of a scientific field
> of great importance today. "
> http://www.ams.org/notices/201205/rtx120500657p.pdf
> All the more timely with the release of the movie "The Imitation Game",
> because Turing & 

[sqlite] PhD student

2015-02-26 Thread Roman Fleysher
I like that!!!

Roman

From: sqlite-users-bounces at mailinglists.sqlite.org [sqlite-users-bounces at 
mailinglists.sqlite.org] on behalf of Simon Slavin [slav...@bigfraud.org]
Sent: Thursday, February 26, 2015 5:33 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] PhD student

On 25 Feb 2015, at 4:28pm, VASILEIOU Eleftheria  wrote:

> Could you please provide me some resources for learning SQL and R?





Simon.
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread Clemens Ladisch
Richard Hipp wrote:
> On 2/26/15, Adam Podstawczy?ski  wrote:
>> Also, to provide more input, I have now noticed that even if the column
>> width is wider than the offending string, this issue still creates problems
>> ? while nothing gets truncated, the position of the next column is
>> miscalculated, causing misalignment:
>
> Proposed fix:  
> https://www.sqlite.org/src/ci?name=b1a9e2916f5b4adef91c34563f71b98e79a10c12

That code correctly computes the number of characters.

However, in the Unicode world, the number of characters is not the same
as the number of columns, not even with so-called fixed-width fonts.

 says:
| In fixed-width output devices, Latin characters all occupy a single
| "cell" position of equal width, whereas ideographic CJK characters
| occupy two such cells.
| [...]
| The following two functions define the column width of an ISO 10646
| character as follows:
|
|- The null character (U+) has a column width of 0.
|
|- Other C0/C1 control characters and DEL will lead to a return
|  value of -1.
|
|- Non-spacing and enclosing combining characters (general
|  category code Mn or Me in the Unicode database) have a
|  column width of 0.
|
|- SOFT HYPHEN (U+00AD) has a column width of 1.
|
|- Other format characters (general category code Cf in the Unicode
|  database) and ZERO WIDTH SPACE (U+200B) have a column width of 0.
|
|- Hangul Jamo medial vowels and final consonants (U+1160-U+11FF)
|  have a column width of 0.
|
|- Spacing characters in the East Asian Wide (W) or East Asian
|  Full-width (F) category as defined in Unicode Technical
|  Report #11 have a column width of 2.
|
|- All remaining characters (including all printable
|  ISO 8859-1 and WGL4 characters, Unicode control characters,
|  etc.) have a column width of 1.


Regards,
Clemens


[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread R.Smith
While it is a presentation issue in the end, it still is an issue, so 
thank you for happening upon it and bringing it to our attention, and no 
need to apologize for lack of investigation.

I think it is possibly the command line utility falling into the 
character-length trap which Hick mentioned when enforcing column widths.


Also, it is of course no longer needed to share the import file, but I 
wouldn't mind having a list of country names in localized format - if 
you are happy to share.


On 2015-02-26 12:58 PM, Adam Podstawczy?ski wrote:
> Thank you Ryan. I will provide the files if still necessary, but I have just 
> discovered it has to do with column width.
>
> Please compare:
>
> .width 5 5 20
> select * from countrynameslocalized where iso="AE";
> AE gswVer?inigti Arabisch
> AE gu ???
> AE he ? ?
> AE hi ???
> AE hr Ujedinjeni Arapski E
> AE hu Egyes?lt Arab Emir?
> AE hy ??? ???
> AE ia Emiratos Arabe Unite
> AE id Uni Emirat Arab
> AE is Sameinu?u arab?sku
> AE it Emirati Arabi Uniti
> AE ja ???
> AE ka ???
> AE km ???
> AE kn ???
> AE ko ?? ?
> AE lo ???
> AE lt Jungtiniai Arab? Em
>
> .width 5 5 150
> select * from countrynameslocalized where iso="AE?;
> AE gswVer?inigti Arabischi Emir??t
> AE gu  ??? 
> AE he ? ? ???
> AE hi ??? ??? ??
> AE hr Ujedinjeni Arapski Emirati
> AE hu Egyes?lt Arab Emir?tus
> AE hy ???  ?
> AE ia Emiratos Arabe Unite
> AE id Uni Emirat Arab
> AE is Sameinu?u arab?sku furstad?min
> AE it Emirati Arabi Uniti
> AE ja 
> AE ka  ? ?
> AE km ??
> AE kn ???  
> AE ko ?? ??
> AE lo 
> AE lt Jungtiniai Arab? Emyratai
>
> So, it looks like this is a presentation issue in the end. I must have not 
> investigated it properly.
>
> While this solves the issue for me, I still believe this behavior is 
> confusing ? truncated characters should be handled more gracefully.
>
> Again, thank you for helping out and regards,



[sqlite] PhD student

2015-02-26 Thread Jim Callahan
Books that discuss BOTH R and SQL are a very small subset and assume some
knowledge of both.
R INTRODUCTORY BOOKS
1. Peter Dalgaard, "Introductory Statistics with R", 2002.
"The book is based upon a set of notes developed for the course in Basic
Statistics for Health Researchers at the Faculty of Health Sciences of the
University of Copenhagen. This course had as its primary target.. students
for the Ph.D. degree in medicine." Intro page viii.
body mass index (BMI) and age of menarche.
2. Jared Lander, "R for Everyone", 2014.
More modern, but less focused on health and a little more scattershot.

R AUTHORITATIVE REFERENCE
1. Brian Ripley and William Venables, "Modern Applied Statistics with S",
2002.

Anything by John Chambers, Robert Gentleman or Brian Ripley or any member
of the "R Core Development Team" can be considered authoritative (the stuff
you can footnote without frowns) on R.

Also, if you are going to use the R mailing list read all of the PDFs that
come with the base installation of R. Its better now, but the R mailing
list used to have a very strong "RTFM" attitude and did not want to explain
anything that was clearly covered in the manuals. Especially read the "R
Import/Export Manual" PDF.

ADVANCED R (with SQL)
Depends on what you are doing.
If you working with health surveys,
Thomas Lumley's "Complex Surveys" is invaluable  One of Lumley's
examples is the CDC's BRFSS, "The Behavioral Risk Factor Surveillance System
 (BRFSS) is the world's largest, on-going telephone health survey system."
(from CDC website). Which in Lumley's example is:

   - The BRFSS 2007 data as a HUGE (245Mb) SQLite database
   .
   ?"?

1. Thomas Lumley, "Complex Surveys: A Guide to Health Analysis Using R",
http://r-survey.r-forge.r-project.org/svybook/index.html

On the other hand, if you are dealing with biological data such as trying
to match results from GeneChips with existing reference sources you might
prefer Robert Gentleman's "R Programming for Bioinformatics" especially,
Chapter 8 "Data Technologies".

1. Robert Gentleman's "R Programming for Bioinformatics", 2009.
"We begin our discussion by describing a range of tools that have been
implemented in R and that can be used to process and transform data. Next
we discuss the different interfaces to databases that are available, but
focus our discussion on SQLite as it is used extensively within the
Bioconductor Project." page 229
The databases discussion resumes on page 238, Section 8.4, discusses SQLite
on page 241 including  a specific example:
"In the code below we load the SQLite package, initialize a driver and open
a dataase that has been supplied with the RBionf [R] package that
accompanies this volume. The database contains a number of tables that map
between identifers on the Affymetrix HG-U95v2 GeneChip and different
quantities of interest such as GO categories or PubMed IDs (that map
published papers that discuss the corresponding genes). We then list the
tables in that database."

Sometimes we get tired of reading dry tomes and we prefer something more
chatty and amusing.

For R and other tools I enjoy reading:

Cathy O'Neil's and Rachel Schutt's "Doing Data Science: Straight Talk from
the Frontline", 2013. It's an O'Reilly book.

For SQLite, I enjoy
Michael Owen's, "The Definitive Guide to SQLite", 2006. -- maybe not the
whole book, but the Chapter 4 example page 75 "Foods mentioned in episodes
of the Seinfield sitcom" is a hoot (and turned out to help me solve an real
world problem).

If you are doing anything beyond Stats 101 classical statistics it helps to
understand the Bayesian bogeyman.

A fascinating, non-technical, historical account is provided by Sharon
Bertsch McGrayne, in her book "The Theory that would not Die...".

BAYESIAN STATISTICS (HISTORY)
Sharon Bertsch McGrayne,
"The Theory That Would Not Die
How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines,
and Emerged Triumphant from Two Centuries of Controversy"
?, 2011.
http://yalepress.yale.edu/book.asp?isbn=9780300169690

"For the student who is being exposed to Bayesian statistics for the first
time, McGrayne?s book provides a wealth of illustrations to whet his or her
appetite for more. It will broaden and deepen the field of reference of the
more experienced statistician, and the general reader will find an
understandable, well-written, and fascinating account of a scientific field
of great importance today. "
http://www.ams.org/notices/201205/rtx120500657p.pdf
All the more timely with the release of the movie "The Imitation Game",
because Turing & Co. cracked the German Enigma code using Bayesian
statistics.?
There few specific "Bayesian" packages in R (an interface to BUGS); but it
lurks in the background of many of them  -- any use of the word "prior".

Hope this helps.
Jim

On Wed, Feb 25, 2015 at 11:28 AM, VASILEIOU Eleftheria  wrote:

>  Hi,
>
> I would need to use R for my analysis for my 

[sqlite] Database connection from within Visual Studio

2015-02-26 Thread Joe Mistachkin

Thanks for the suggestions.  As soon as the 1.0.95.0 release is out the
door,
I'll work on refactoring the download page to make it easier to use.

--
Joe Mistachkin



[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread Adam Podstawczyński
Hi Ryan,

I got it from here: 
http://jonahellison.com/21640-translated-country-names-unicode-csv

Also, to provide more input, I have now noticed that even if the column width 
is wider than the offending string, this issue still creates problems ? while 
nothing gets truncated, the position of the next column is miscalculated, 
causing misalignment:

United States   ar   ??? ?  
  Hackensack  
United States   ca  Estats Units
  Hackensack
  
United States   ca  Estats Units
  Hackensack
  
United States   cs  Spojen? st?ty   
Hackensack  

United States   cs  Spojen? st?ty   
Hackensack  

United States   da  USA 
  Hackensack
  
United States   de  Vereinigte Staaten  
  Hackensack
  
United States   el   ? ???    
Hackensack  
United States   el   ? ???    
Hackensack  
United States   en  United States   
  Hackensack
  
United States   es  Estados Unidos  
  Hackensack
  
United States   fi  Yhdysvallat 
  Hackensack
  
United States   fr  ?tats-Unis  
 Hackensack 
 
United States   he  ? ? 
???  
United States   hr  Sjedinjene Dr?ave   
 Hackensack 
 
United States   hr  Sjedinjene Dr?ave   
 Hackensack

Thanks,
-- 
adam

> On 26 Feb 2015, at 13:04, R.Smith  wrote:
> 
> While it is a presentation issue in the end, it still is an issue, so thank 
> you for happening upon it and bringing it to our attention, and no need to 
> apologize for lack of investigation.
> 
> I think it is possibly the command line utility falling into the 
> character-length trap which Hick mentioned when enforcing column widths.
> 
> 
> Also, it is of course no longer needed to share the import file, but I 
> wouldn't mind having a list of country names in localized format - if you are 
> happy to share.
> 
> 
> On 2015-02-26 12:58 PM, Adam Podstawczy?ski wrote:
>> Thank you Ryan. I will provide the files if still necessary, but I have just 
>> discovered it has to do with column width.
>> 
>> Please compare:
>> 
>> .width 5 5 20
>> select * from countrynameslocalized where iso="AE";
>> AE gswVer?inigti Arabisch
>> AE gu ???
>> AE he ? ?
>> AE hi ???
>> AE hr Ujedinjeni Arapski E
>> AE hu Egyes?lt Arab Emir?
>> AE hy ??? ???
>> AE ia Emiratos Arabe Unite
>> AE id Uni Emirat Arab
>> AE is Sameinu?u arab?sku
>> AE it Emirati Arabi Uniti
>> AE ja ???
>> AE ka ???
>> AE km ???
>> AE kn ???
>> AE ko ?? ?
>> AE lo ???
>> AE lt Jungtiniai Arab? Em
>> 
>> .width 5 5 150
>> select * from countrynameslocalized where iso="AE?;
>> AE gswVer?inigti Arabischi Emir??t
>> AE gu  ??? 
>> AE he ? ? ???
>> AE hi ??? ??? ??
>> AE hr Ujedinjeni Arapski Emirati
>> AE hu Egyes?lt Arab Emir?tus
>> AE hy ???  ?
>> AE ia Emiratos Arabe Unite
>> AE id Uni Emirat Arab
>> AE is Sameinu?u arab?sku furstad?min
>> AE it Emirati Arabi Uniti
>> AE ja 
>> AE ka  ? ?
>> AE km ??
>> AE kn ???  

[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread Adam Podstawczyński
Hi Simon,

Yes, it seems so. Because column width is based on bytes, it may happen that 
e.g. a 2-byte character is split in half which produces a non-recognizable or 
control character (represented in the terminal as ???).

The solution would be to check the column width based on text length but 
respecting file encoding. This may not be trivial, since (AFAIK) sqlite3 is not 
informed of the file encoding at import time.

Thanks,
-- 
adam

> On 26 Feb 2015, at 12:24, Simon Slavin  wrote:
> 
> 
> On 26 Feb 2015, at 10:58am, Adam Podstawczy?ski  
> wrote:
> 
>> Thank you Ryan. I will provide the files if still necessary, but I have just 
>> discovered it has to do with column width.
> 
> This suggests that the column width setting is being checked against the 
> number of bytes in the text value rather than the width of the text value 
> when being printed out.  Thanks for figuring it out.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread R.Smith
Thanks Adam,

Could you kindly post an example file somewhere (the kind which you 
import) for us to test with?  (The list does not allow attachments).

Also, could you try a later version of SQLite so that we could see if 
this problem still exists on your platform. Updated versions obtainable 
from:
http://www.sqlite.org/downloads/

Thanks.
Ryan

On 2015-02-26 12:10 PM, Adam Podstawczy?ski wrote:
> Hi all,
>
> Thanks for trying to help. Answering all suggestions and questions:
>
> 1. I don?t think this is a presentation issue: I exported the data which I 
> had imported earlier, and the issue is there in the exported files.
>
> 2. Version information:
>
> Last login: Thu Feb 26 09:06:41 on ttys004
> Hg:locs adampodstawczynski$ sqlite3 --version
> -- Loading resources from /Users/adampodstawczynski/.sqliterc
>
> 3.8.7.1 2014-10-29 13:59:56 3b7b72c4685aa5cf5e675c2c47ebec10d9704221
> Hg:locs adampodstawczynski$ cat ~/.sqliterc
> .headers on
> .mode column
> Hg:locs adampodstawczynski$ uname -a
> Darwin Hg 14.1.0 Darwin Kernel Version 14.1.0: Mon Dec 22 23:10:38 PST 2014; 
> root:xnu-2782.10.72~2/RELEASE_X86_64 x86_64
> Hg:locs adampodstawczynski$
>
> This is on Mac OS 10.10.2 (Yosemite).
>
> 3. I?m not doing any length() operation anywhere in the process, so there is 
> no risk of truncation because of that. I just import the data and the char 
> corruption issue is immediately there.
>
> Thanks,



[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread Simon Slavin

On 26 Feb 2015, at 10:58am, Adam Podstawczy?ski  
wrote:

> Thank you Ryan. I will provide the files if still necessary, but I have just 
> discovered it has to do with column width.

This suggests that the column width setting is being checked against the number 
of bytes in the text value rather than the width of the text value when being 
printed out.  Thanks for figuring it out.

Simon.


[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread R.Smith
This might be a representation error only.

I followed the OP's method of importing the file section (to a 
encoding=UTF8 DB) using the command-line facility to import the 
tab-separated file and then selecting the values from it.

File content of utf8test.txt:

noNewcastle1NULLNULLNULL
uk???1NULLNULLNULL
ja???1NULLNULLNULL


In the command-line facility I got all weird characters, the usual kind 
of thing when you try to represent Unicode / UTF8 in ANSI or ASCII text.

C:\Users\R.Smith\Desktop>sqlite3 TestDB2.db
SQLite version 3.8.8.1 2015-01-20 16:51:25
Enter ".help" for usage hints.
sqlite> PRAGMA encoding=UTF8;
sqlite> CREATE TABLE cities(ct TEXT,nm TEXT,i INT,v1,v2,v3);
sqlite> .separator ''
sqlite> .import utf8test.txt cities
sqlite> SELECT * FROM cities;
???no|Newcastle|1|NULL|NULL|NULL
uk|??|1|NULL|NULL|NULL
ja|?|1|NULL|NULL|NULL
sqlite>

(That might also just be my console output not being UTF8 friendly.)


However, when I used another DB tool to open the database, it looked 
perfect:

SELECT * FROM "cities" WHERE 1;

   -- ctnmiv1v2v3
   -- -----------
   -- ?noNewcastle1NULLNULLNULL
   -- uk???1NULLNULLNULL
   -- ja???1NULLNULLNULL

   --Item Stats:  Item No:   2 Query Size
(Chars):  33
   -- Result Columns:6 Result
Rows: 3
   -- VM Work Steps: 32Rows
Modified:   0
   -- Full Query Time:   -- --- --- --- --.
   -- Query Result:  Success.



Conclusion: The importing works fine, the display of data might be 
missing a trick.

Please ask the OP to test the DB with another viewer, and if still wrong 
please supply the imported file and resulting DB file from his import on 
some file-share site and note the SQLite version used.



On 2015-02-26 10:43 AM, Adam Podstawczy?ski wrote:
> Hi all,
>
> I experienced an issue with character encoding in sqlite3 following an import 
> from a CSV file.
>
> The issue is described here: 
> http://stackoverflow.com/questions/28719413/sqlite3-corrupts-some-chars-after-importing?noredirect=1#comment45732717_28719413
>
> In short: after importing a UTF-8 file via .import, only some non-lating 
> chars, mostly at the end of strings, get corrupted.
>
> Please advise,



[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread Adam Podstawczyński
Thank you Ryan. I will provide the files if still necessary, but I have just 
discovered it has to do with column width.

Please compare:

.width 5 5 20
select * from countrynameslocalized where iso="AE";
AE gswVer?inigti Arabisch
AE gu ???
AE he ? ?
AE hi ???
AE hr Ujedinjeni Arapski E
AE hu Egyes?lt Arab Emir?
AE hy ??? ???
AE ia Emiratos Arabe Unite
AE id Uni Emirat Arab 
AE is Sameinu?u arab?sku
AE it Emirati Arabi Uniti 
AE ja ???
AE ka ???
AE km ???
AE kn ???
AE ko ?? ?
AE lo ???
AE lt Jungtiniai Arab? Em

.width 5 5 150
select * from countrynameslocalized where iso="AE?;
AE gswVer?inigti Arabischi Emir??t  

 
AE gu  ???  
  
AE he ? ? ???   
   
AE hi ??? ??? ??

AE hr Ujedinjeni Arapski Emirati


AE hu Egyes?lt Arab Emir?tus

  
AE hy ???  ?

AE ia Emiratos Arabe Unite  


AE id Uni Emirat Arab   


AE is Sameinu?u arab?sku furstad?min

 
AE it Emirati Arabi Uniti   


AE ja   

AE ka  ? ?  

AE km ??

AE kn ???   
  
AE ko ?? ?? 

AE lo   

AE lt Jungtiniai Arab? Emyratai   

So, it looks like this is a presentation issue in the end. I must have not 
investigated it properly.

While this solves the issue for me, I still believe this behavior is confusing 
? truncated characters should be handled more gracefully.

Again, thank you for helping out and regards,
-- 
adam

> On 26 Feb 2015, at 10:29, R.Smith  wrote:
> 
> Thanks Adam,
> 
> Could you kindly post an example file somewhere (the kind which you import) 
> for us to test with?  (The list does not allow attachments).
> 
> Also, could you try a later version of SQLite so that we could see if this 
> problem still exists on your platform. Updated versions obtainable from:
> http://www.sqlite.org/downloads/
> 
> Thanks.
> Ryan
> 
> On 2015-02-26 12:10 PM, Adam Podstawczy?ski wrote:
>> Hi all,
>> 
>> Thanks for trying to help. Answering all suggestions and questions:
>> 
>> 1. I don?t think this is a presentation issue: I exported the data which I 
>> had imported earlier, and the issue is there in the exported files.
>> 
>> 2. Version information:
>> 
>> Last login: Thu Feb 26 09:06:41 on ttys004
>> Hg:locs adampodstawczynski$ sqlite3 --version
>> -- Loading resources from /Users/adampodstawczynski/.sqliterc
>> 
>> 3.8.7.1 2014-10-29 13:59:56 3b7b72c4685aa5cf5e675c2c47ebec10d9704221
>> Hg:locs adampodstawczynski$ cat ~/.sqliterc
>> .headers on
>> .mode column
>> Hg:locs adampodstawczynski$ uname -a
>> Darwin Hg 14.1.0 Darwin Kernel Version 14.1.0: Mon Dec 22 23:10:38 PST 2014; 
>> root:xnu-2782.10.72~2/RELEASE_X86_64 x86_64
>> Hg:locs adampodstawczynski$
>> 
>> This is on Mac OS 10.10.2 (Yosemite).
>> 
>> 3. I?m not doing any length() operation anywhere in the 

[sqlite] PhD student

2015-02-26 Thread Gabor Grothendieck
On Wed, Feb 25, 2015 at 11:28 AM, VASILEIOU Eleftheria
 wrote:
> I would need to use R for my analysis for my Project and my supervisor 
> suggested me to learn the SQL language for R.
> Could you please provide me some resources for learning SQL and R?

Assuming you are looking to use SQL to work with R data.frames see
this link for numerous examples:

http://sqldf.googlecode.com


[sqlite] PhD student

2015-02-26 Thread Simon Slavin

On 25 Feb 2015, at 4:28pm, VASILEIOU Eleftheria  wrote:

> Could you please provide me some resources for learning SQL and R?





Simon.


[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread Adam Podstawczyński
Hi all,

Thanks for trying to help. Answering all suggestions and questions:

1. I don?t think this is a presentation issue: I exported the data which I had 
imported earlier, and the issue is there in the exported files.

2. Version information:

Last login: Thu Feb 26 09:06:41 on ttys004
Hg:locs adampodstawczynski$ sqlite3 --version
-- Loading resources from /Users/adampodstawczynski/.sqliterc

3.8.7.1 2014-10-29 13:59:56 3b7b72c4685aa5cf5e675c2c47ebec10d9704221
Hg:locs adampodstawczynski$ cat ~/.sqliterc 
.headers on
.mode column
Hg:locs adampodstawczynski$ uname -a
Darwin Hg 14.1.0 Darwin Kernel Version 14.1.0: Mon Dec 22 23:10:38 PST 2014; 
root:xnu-2782.10.72~2/RELEASE_X86_64 x86_64
Hg:locs adampodstawczynski$ 

This is on Mac OS 10.10.2 (Yosemite).

3. I?m not doing any length() operation anywhere in the process, so there is no 
risk of truncation because of that. I just import the data and the char 
corruption issue is immediately there.

Thanks,
-- 
adam

> On 26 Feb 2015, at 10:04, Hick Gunter  wrote:
> 
> Maybe you are falling into the character/byte trap. The SQL function length() 
> returns the number of CHARACTERS in a string, which - for UTF encoded strings 
> containing non-latin characters - is smaller than the number of BYTES 
> required to represent them.
> 
> Typically you will be losing bytes at the end of the string which becomes 
> really obvious only when  you lose part of a multibyte UTF character
> 
> -Urspr?ngliche Nachricht-
> Von: Adam Podstawczy?ski [mailto:adam at podstawczynski.com]
> Gesendet: Donnerstag, 26. Februar 2015 09:44
> An: sqlite-users at mailinglists.sqlite.org
> Betreff: [sqlite] Characters corrupt after importing a CSV file
> 
> Hi all,
> 
> I experienced an issue with character encoding in sqlite3 following an import 
> from a CSV file.
> 
> The issue is described here: 
> http://stackoverflow.com/questions/28719413/sqlite3-corrupts-some-chars-after-importing?noredirect=1#comment45732717_28719413
> 
> In short: after importing a UTF-8 file via .import, only some non-lating 
> chars, mostly at the end of strings, get corrupted.
> 
> Please advise,
> --
> adam
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
> 
> ___
> Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: hick at scigames.at
> 
> This communication (including any attachments) is intended for the use of the 
> intended recipient(s) only and may contain information that is confidential, 
> privileged or legally protected. Any unauthorized use or dissemination of 
> this communication is strictly prohibited. If you have received this 
> communication in error, please immediately notify the sender by return e-mail 
> message and delete all copies of the original communication. Thank you for 
> your cooperation.
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread Hick Gunter
Maybe you are falling into the character/byte trap. The SQL function length() 
returns the number of CHARACTERS in a string, which - for UTF encoded strings 
containing non-latin characters - is smaller than the number of BYTES required 
to represent them.

Typically you will be losing bytes at the end of the string which becomes 
really obvious only when  you lose part of a multibyte UTF character

-Urspr?ngliche Nachricht-
Von: Adam Podstawczy?ski [mailto:adam at podstawczynski.com]
Gesendet: Donnerstag, 26. Februar 2015 09:44
An: sqlite-users at mailinglists.sqlite.org
Betreff: [sqlite] Characters corrupt after importing a CSV file

Hi all,

I experienced an issue with character encoding in sqlite3 following an import 
from a CSV file.

The issue is described here: 
http://stackoverflow.com/questions/28719413/sqlite3-corrupts-some-chars-after-importing?noredirect=1#comment45732717_28719413

In short: after importing a UTF-8 file via .import, only some non-lating chars, 
mostly at the end of strings, get corrupted.

Please advise,
--
adam

___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.




[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread Richard Hipp
Proposed fix:  
https://www.sqlite.org/src/ci?name=b1a9e2916f5b4adef91c34563f71b98e79a10c12

On 2/26/15, Adam Podstawczy?ski  wrote:
> Hi Ryan,
>
> I got it from here:
> http://jonahellison.com/21640-translated-country-names-unicode-csv
>
> Also, to provide more input, I have now noticed that even if the column
> width is wider than the offending string, this issue still creates problems
> ? while nothing gets truncated, the position of the next column is
> miscalculated, causing misalignment:
>
> United States   ar   ??? ?
>  Hackensack
> United States   ca  Estats Units
>  Hackensack
>
> United States   ca  Estats Units
>  Hackensack
>
> United States   cs  Spojen? st?ty
>Hackensack
>
> United States   cs  Spojen? st?ty
>Hackensack
>
> United States   da  USA
>  Hackensack
>
> United States   de  Vereinigte Staaten
>  Hackensack
>
> United States   el   ? ??? 
>  Hackensack
> United States   el   ? ??? 
>  Hackensack
> United States   en  United States
>  Hackensack
>
> United States   es  Estados Unidos
>  Hackensack
>
> United States   fi  Yhdysvallat
>  Hackensack
>
> United States   fr  ?tats-Unis
> Hackensack
>
> United States   he  ? ?
>???
> United States   hr  Sjedinjene Dr?ave
> Hackensack
>
> United States   hr  Sjedinjene Dr?ave
> Hackensack
>
> Thanks,
> --
> adam
>
>> On 26 Feb 2015, at 13:04, R.Smith  wrote:
>>
>> While it is a presentation issue in the end, it still is an issue, so
>> thank you for happening upon it and bringing it to our attention, and no
>> need to apologize for lack of investigation.
>>
>> I think it is possibly the command line utility falling into the
>> character-length trap which Hick mentioned when enforcing column widths.
>>
>>
>> Also, it is of course no longer needed to share the import file, but I
>> wouldn't mind having a list of country names in localized format - if you
>> are happy to share.
>>
>>
>> On 2015-02-26 12:58 PM, Adam Podstawczy?ski wrote:
>>> Thank you Ryan. I will provide the files if still necessary, but I have
>>> just discovered it has to do with column width.
>>>
>>> Please compare:
>>>
>>> .width 5 5 20
>>> select * from countrynameslocalized where iso="AE";
>>> AE gswVer?inigti Arabisch
>>> AE gu ???
>>> AE he ? ?
>>> AE hi ???
>>> AE hr Ujedinjeni Arapski E
>>> AE hu Egyes?lt Arab Emir?
>>> AE hy ??? ???
>>> AE ia Emiratos Arabe Unite
>>> AE id Uni Emirat Arab
>>> AE is Sameinu?u arab?sku
>>> AE it Emirati Arabi Uniti
>>> AE ja ???
>>> AE ka ???
>>> AE km ???
>>> AE kn ???
>>> AE ko ?? ?
>>> AE lo ???
>>> AE lt Jungtiniai Arab? Em
>>>
>>> .width 5 5 150
>>> select * from countrynameslocalized where iso="AE?;
>>> AE gswVer?inigti Arabischi Emir??t
>>> AE gu  ??? 
>>> AE he ? ? ???
>>> AE hi ??? ??? ??
>>> AE hr Ujedinjeni Arapski Emirati
>>> AE hu Egyes?lt Arab Emir?tus
>>> AE hy ???  ?
>>> AE ia Emiratos Arabe Unite
>>> AE id Uni Emirat Arab
>>> AE is Sameinu?u arab?sku furstad?min
>>> AE it Emirati Arabi Uniti
>>> AE ja 
>>> AE ka  ? ?
>>> AE km ??
>>> AE kn ???  
>>> AE ko ?? ??
>>> AE lo 
>>> AE lt Jungtiniai Arab? Emyratai
>>>
>>> So, it looks like this is a presentation issue in the end. I must have
>>> not investigated it properly.
>>>
>>> While this solves the issue for me, I still believe this behavior is
>>> confusing ? truncated characters should be handled more gracefully.
>>>
>>> Again, thank you for helping out and regards,
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> 

[sqlite] recurrent failure mode

2015-02-26 Thread Dave Dyer

>
>You might want to read my message on the topic from the list archives,
>dated Sat, 31 Jan 2015.  

In this case, no concurrent or multiple users are involved.  It's just
one client and the database.   There's still plenty of scope for the
networked file system to do things that make sqlite fail.



[sqlite] Characters corrupt after importing a CSV file

2015-02-26 Thread Adam Podstawczyński
Hi all,

I experienced an issue with character encoding in sqlite3 following an import 
from a CSV file.

The issue is described here: 
http://stackoverflow.com/questions/28719413/sqlite3-corrupts-some-chars-after-importing?noredirect=1#comment45732717_28719413

In short: after importing a UTF-8 file via .import, only some non-lating chars, 
mostly at the end of strings, get corrupted.

Please advise,
-- 
adam



[sqlite] PhD student

2015-02-26 Thread John McKown
On Wed, Feb 25, 2015 at 10:28 AM, VASILEIOU Eleftheria
 wrote:
>  Hi,
>
> I would need to use R for my analysis for my Project and my supervisor 
> suggested me to learn the SQL language for R.
> Could you please provide me some resources for learning SQL and R?

I'm not sure if you want to learn SQL and R. Or if you want to learn
just R and it's use of SQL (this assumes you know SQL already). My
confusion likely is a result of my being a Texan. ;-)

Some basics of the R language, in general, are here:

http://cran.r-project.org/manuals.html
http://adv-r.had.co.nz/ (this has move advanced R work by Hadley
Wickham, an R guru/wizard)
http://en.wikibooks.org/wiki/R_Programming

If you need to learn SQL as well, then there are some useful sites at well.
http://www.w3schools.com/sql/ is a nice one.
http://www.sqlcourse.com/ I don't know this one, but it looks interesting
Google search: 
https://www.google.com/webhp?sourceid=chrome-instant=1C1CHFX_enUS597US598=1=2=UTF-8#q=sql%20tutorial

If you tell us which SQL software you will be using, perhaps we could
recommend other sites or forums where you can get SQL specific help.

Now using SQL in R is a bit more difficult to explain. Mainly because
there are differences depending on the SQL server you are using. I
have used both RODBC and DBI. RODBC is R for ODBC connections. ODBC is
an industry standard interface to a number of different SQL servers
such as MS SQL Server, PostgreSQL, and MySQL (MariaDB). DBI is another
interface, by the previously mentioned Hadley Wickham, which can
connect to may different data base servers as well. Now, just for
myself, I prefer DBI because it is a closer match to what I am used to
using in other languages, such as PERL. A nice "README" by Mr. Wickham
is here:
https://github.com/rstats-db/DBI/blob/master/README.md
http://www.stat.berkeley.edu/~nolan/stat133/Fall05/lectures/SQL-R.pdf

An overview of RODBC is here:
http://cran.r-project.org/web/packages/RODBC/RODBC.pdf and perhaps of
some interest to you might be that the maintainer is in the U.K. Brian
Ripley ripley at stats.ox.ac.uk

You didn't say if you were going to be connecting to an existing SQL
system, or creating your own. If you are going to create your own,
then perhaps the easiest to implement is SQLite. It is not as full
featured as Oracle, MS SQL Sever, or PostgreSQL, but it has the
advantage of being "embedded". That is, there is very little set up
because the "server" code is embedded into R itself. This means that
you don't need to set up an independent server. Of course, SQLite is
"lite" compared to the the "full function" data base servers
previously mentioned. For SQLite, the R package is RSQLite and you can
look at the "README" here:
http://cran.r-project.org/web/packages/RSQLite/RSQLite.pdf
One thing nice about this is that it is the work of Mr. Wickham and is
generally compatible with his DBI package. This may be helpful because
you could start off "easy" with RSQLite, then "upgrade" to a "real
data base server" with most of the R coding remaining generally the
same.

And one other thing that I will warn you of about this forum. It is a
"no homework" forum. This doesn't mean we won't help with general
questions about approaches and the like, but people can become a bit
"terse" if they feel you are trying to get someone to do your work. I
just mention this because it does come up on rare occasion.


>
>
> Thanks in advance,
> Eleftheria
>
> Eleftheria Vasileiou BSc, MPH
> Research Student, Centre for Population Health Sciences
> Room 815, Old Medical School, University of Edinburgh
>
> E.Vasileiou at ed.ac.uk
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.



-- 
He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown


[sqlite] sqlite3_column_count and sqlite3_data_count

2015-02-26 Thread Bart Smissaert
Well, sqlite3_column_count seems to have a clear advantage here
in that you don't have to worry about being in a row or not.

RBS

On Thu, Feb 26, 2015 at 12:45 AM, Simon Slavin  wrote:

>
> On 26 Feb 2015, at 12:41am, Bart Smissaert 
> wrote:
>
> > OK, thanks for clearing that up. So there seems then little point then in
> > using sqlite3_data_count?
>
> Use whichever one you prefer the sound of.  No clear advantage of either
> one.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] sqlite3_column_count and sqlite3_data_count

2015-02-26 Thread Simon Slavin

On 26 Feb 2015, at 12:41am, Bart Smissaert  wrote:

> OK, thanks for clearing that up. So there seems then little point then in
> using sqlite3_data_count?

Use whichever one you prefer the sound of.  No clear advantage of either one.

Simon.


[sqlite] recurrent failure mode

2015-02-26 Thread Simon Slavin

On 26 Feb 2015, at 12:26am, Dave Dyer  wrote:

>> Do you have any multi-access things going on ?  Two or more computers, 
>> applications, processes or threads trying to access the database at the same 
>> time ?
> 
> No, but it would be normal for the database to be on a different
> computer than the sqlite client, and be using whatever networked 
> file system is common.  The culprit clients seem to be macs, we're
> still seeking more information about the specifics.

Sorry to have to tell you, but almost no implementations of network file 
systems implement locking properly.  So any situations involving simultaneous 
writing can lead to corruption.  But if you do not have simultaneous writing, 
there's no good reason for database corruption.  What you need to find out next 
is

OS version that the computer holding the file is running.
File system that the disk the file is on is formatted in.
OS version that the computer accessing the file is running.
Network protocol used to access the remote filespace (SMB or AFP or something 
like that).

Earlier you wrote

> I suppose that this might be a sqlite bug if the "insert records" step
> and the "maintain indexes" step were separated by a disk error and the
> rollback of the failed transaction was incomplete.

Now you've told me that you are checking the result codes returned by all 
calls, this would have to be a significant bug in SQLite and I don't think I've 
heard of any other reports of it.

Simon.


[sqlite] sqlite3_column_count and sqlite3_data_count

2015-02-26 Thread Bart Smissaert
OK, thanks for clearing that up. So there seems then little point then in
using sqlite3_data_count?

RBS


On Thu, Feb 26, 2015 at 12:16 AM, Igor Tandetnik  wrote:

> On 2/25/2015 7:03 PM, Bart Smissaert wrote:
>
>> Could somebody tell me what the difference is between these 2 functions?
>>
>
> sqlite3_column_count() returns correct number as soon as the statement is
> prepared. sqlite3_data_count returns zero unless the statement is actually
> positioned on a row (sqlite3_step was called, and returned SQLITE_ROW).
>
> Why both are needed, I'm not sure.
> --
> Igor Tandetnik
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] sqlite3_column_count and sqlite3_data_count

2015-02-26 Thread Bart Smissaert
Could somebody tell me what the difference is between these 2 functions?
Couldn't work it out from the documentation and not been able to setup an
example
where they produce different results.

RBS