Re: [algogeeks] How to read multiple utf-8 encoded string from stdin
this way only helps in linux but when i use in windows with utf-8 encoded input file for reading characters i cant do it , secondly how to count non ascii characters from utf-8 string , any one is having any idea on this ? On Mon, Nov 25, 2013 at 11:50 AM, Karthikeyan V.B kartmu...@gmail.comwrote: From StackOverflow, --- fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and up. Change your code like this: infile = fopen(inname, r, ccs=UTF-8); On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey nishant.bits.me...@gmail.com wrote: Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1 string per line), count all *non-ascii* characters (ascii characters are with ordinal decimal 0 to 127) and print the total non-ascii character count to STDOUT (1 number per line). Contraint : - You cannot use any *wchar.h* service in your program. - The UTF-8 strings supplied to you can have *1 or more whitespaces* in them. - No input string will have a character length greater than*200 *(including spaces) - You will be given multiple lines of input (1 string per line). - Input will be limited to UTF-8 encoded strings and will not contain any garbage values. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com.
Re: [algogeeks] How to read multiple utf-8 encoded string from stdin
is this what you are asking: #includestdio.h #includestring.h /* MASK to check the numebr of BYTES character is using */ #define ASCII_BYTE 0x80 #define TWO_BYTE 0xC0 #define THREE_BYTES 0xE0 #define FOUR_BYTES 0xF0 #define FIVE_BYTES 0xF8 #define SIX_BYTES 0xFC #define MASK_BYTE 0xFF #define MAX_BUFF 200 int non_ascii_count(char arr[]){ unsigned int non_ascii = 0, count = 0,num=0; char *ch = arr; while (*ch != '\0') { num = (unsigned int)(*ch) ; /* Only one last byte of the uint is required */ num = num MASK_BYTE; /* Check for multi-byte only if its not an ASCII, val 128 */ if (num ASCII_BYTE ) { /* Is a Non ASCII */ count = 0; if (num TWO_BYTE) { count = 2; } else if (num THREE_BYTES) { count = 3; } else if (num FOUR_BYTES) { count = 4; } else if (num FIVE_BYTES) { count = 5; } else if (num SIX_BYTES) { count = 6; } /* Increment nonascii count and char pointer accordingly */ non_ascii++; ch+=count; } /* ASCII , increment by one only */ ch++; } return non_ascii; } int main(void) { FILE* fd = stdin; char buff[MAX_BUFF + 2]; /* 2 Extra for \0 \n */ memset(buff,0,sizeof(buff)); /* fgets reads max one less than provided length so adding 1 */ while (NULL != fgets(buff,MAX_BUFF+1,fd)) printf(%d\n, non_ascii_count(buff)); return 0; } On Tue, Nov 26, 2013 at 7:43 PM, Nishant Pandey nishant.bits.me...@gmail.com wrote: this way only helps in linux but when i use in windows with utf-8 encoded input file for reading characters i cant do it , secondly how to count non ascii characters from utf-8 string , any one is having any idea on this ? On Mon, Nov 25, 2013 at 11:50 AM, Karthikeyan V.B kartmu...@gmail.comwrote: From StackOverflow, --- fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and up. Change your code like this: infile = fopen(inname, r, ccs=UTF-8); On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey nishant.bits.me...@gmail.com wrote: Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1 string per line), count all *non-ascii* characters (ascii characters are with ordinal decimal 0 to 127) and print the total non-ascii character count to STDOUT (1 number per line). Contraint : - You cannot use any *wchar.h* service in your program. - The UTF-8 strings supplied to you can have *1 or more whitespaces* in them. - No input string will have a character length greater than*200 *(including spaces) - You will be given multiple lines of input (1 string per line). - Input will be limited to UTF-8 encoded strings and will not contain any garbage values. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- Regards, Pradeep -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com.
Re: [algogeeks] How to read multiple utf-8 encoded string from stdin
The Code is Awsome pradeep except few things : 1) the output is coming wrong in 2 cases : a) x√ab c counting non-ascii as 2 it should be 1 b)ɖ Ɛ counting non-ascii as 4 it should be 2. On Tue, Nov 26, 2013 at 7:57 PM, Pradeep Dubey pradeep.d...@gmail.comwrote: is this what you are asking: #includestdio.h #includestring.h /* MASK to check the numebr of BYTES character is using */ #define ASCII_BYTE 0x80 #define TWO_BYTE 0xC0 #define THREE_BYTES 0xE0 #define FOUR_BYTES 0xF0 #define FIVE_BYTES 0xF8 #define SIX_BYTES 0xFC #define MASK_BYTE 0xFF #define MAX_BUFF 200 int non_ascii_count(char arr[]){ unsigned int non_ascii = 0, count = 0,num=0; char *ch = arr; while (*ch != '\0') { num = (unsigned int)(*ch) ; /* Only one last byte of the uint is required */ num = num MASK_BYTE; /* Check for multi-byte only if its not an ASCII, val 128 */ if (num ASCII_BYTE ) { /* Is a Non ASCII */ count = 0; if (num TWO_BYTE) { count = 2; } else if (num THREE_BYTES) { count = 3; } else if (num FOUR_BYTES) { count = 4; } else if (num FIVE_BYTES) { count = 5; } else if (num SIX_BYTES) { count = 6; } /* Increment nonascii count and char pointer accordingly */ non_ascii++; ch+=count; } /* ASCII , increment by one only */ ch++; } return non_ascii; } int main(void) { FILE* fd = stdin; char buff[MAX_BUFF + 2]; /* 2 Extra for \0 \n */ memset(buff,0,sizeof(buff)); /* fgets reads max one less than provided length so adding 1 */ while (NULL != fgets(buff,MAX_BUFF+1,fd)) printf(%d\n, non_ascii_count(buff)); return 0; } On Tue, Nov 26, 2013 at 7:43 PM, Nishant Pandey nishant.bits.me...@gmail.com wrote: this way only helps in linux but when i use in windows with utf-8 encoded input file for reading characters i cant do it , secondly how to count non ascii characters from utf-8 string , any one is having any idea on this ? On Mon, Nov 25, 2013 at 11:50 AM, Karthikeyan V.B kartmu...@gmail.comwrote: From StackOverflow, --- fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and up. Change your code like this: infile = fopen(inname, r, ccs=UTF-8); On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey nishant.bits.me...@gmail.com wrote: Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1 string per line), count all *non-ascii* characters (ascii characters are with ordinal decimal 0 to 127) and print the total non-ascii character count to STDOUT (1 number per line). Contraint : - You cannot use any *wchar.h* service in your program. - The UTF-8 strings supplied to you can have *1 or more whitespaces* in them. - No input string will have a character length greater than*200 *(including spaces) - You will be given multiple lines of input (1 string per line). - Input will be limited to UTF-8 encoded strings and will not contain any garbage values. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- Regards, Pradeep -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com.
Re: [algogeeks] How to read multiple utf-8 encoded string from stdin
How about counting non ASCII from it ... Checking mere ASCII doesn't help as utf8 it self is combination of characters ... On Nov 25, 2013 11:50 AM, Karthikeyan V.B kartmu...@gmail.com wrote: From StackOverflow, --- fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and up. Change your code like this: infile = fopen(inname, r, ccs=UTF-8); On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey nishant.bits.me...@gmail.com wrote: Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1 string per line), count all *non-ascii* characters (ascii characters are with ordinal decimal 0 to 127) and print the total non-ascii character count to STDOUT (1 number per line). Contraint : - You cannot use any *wchar.h* service in your program. - The UTF-8 strings supplied to you can have *1 or more whitespaces* in them. - No input string will have a character length greater than*200 *(including spaces) - You will be given multiple lines of input (1 string per line). - Input will be limited to UTF-8 encoded strings and will not contain any garbage values. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com.
Re: [algogeeks] How to read multiple utf-8 encoded string from stdin
From StackOverflow, --- fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and up. Change your code like this: infile = fopen(inname, r, ccs=UTF-8); On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey nishant.bits.me...@gmail.com wrote: Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1 string per line), count all *non-ascii* characters (ascii characters are with ordinal decimal 0 to 127) and print the total non-ascii character count to STDOUT (1 number per line). Contraint : - You cannot use any *wchar.h* service in your program. - The UTF-8 strings supplied to you can have *1 or more whitespaces* in them. - No input string will have a character length greater than*200 *(including spaces) - You will be given multiple lines of input (1 string per line). - Input will be limited to UTF-8 encoded strings and will not contain any garbage values. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com.
[algogeeks] How to read multiple utf-8 encoded string from stdin
Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1 string per line), count all *non-ascii* characters (ascii characters are with ordinal decimal 0 to 127) and print the total non-ascii character count to STDOUT (1 number per line). Contraint : - You cannot use any *wchar.h* service in your program. - The UTF-8 strings supplied to you can have *1 or more whitespaces* in them. - No input string will have a character length greater than*200 *(including spaces) - You will be given multiple lines of input (1 string per line). - Input will be limited to UTF-8 encoded strings and will not contain any garbage values. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To unsubscribe from this group and stop receiving emails from it, send an email to algogeeks+unsubscr...@googlegroups.com.