Re: [algogeeks] How to read multiple utf-8 encoded string from stdin

2013-11-26 Thread Nishant Pandey
this way only helps in linux but when i use in windows with utf-8 encoded
input file for reading characters i cant do it , secondly how to count non
ascii characters from utf-8 string , any one is having any idea on this ?


On Mon, Nov 25, 2013 at 11:50 AM, Karthikeyan V.B kartmu...@gmail.comwrote:



   From StackOverflow,

 ---

 fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and
 up. Change your code like this:


 infile = fopen(inname, r, ccs=UTF-8);



 On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey 
 nishant.bits.me...@gmail.com wrote:

 Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1
 string per line), count all *non-ascii* characters (ascii characters are
 with ordinal decimal 0 to 127) and print the total non-ascii character
 count to STDOUT (1 number per line).

 Contraint :


- You cannot use any *wchar.h* service in your program.
- The UTF-8 strings supplied to you can have *1 or more whitespaces* in
them.
- No input string will have a character length greater than*200 
 *(including
spaces)
- You will be given multiple lines of input (1 string per line).
- Input will be limited to UTF-8 encoded strings and will not contain
any garbage values.

  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to algogeeks+unsubscr...@googlegroups.com.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to algogeeks+unsubscr...@googlegroups.com.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to algogeeks+unsubscr...@googlegroups.com.


Re: [algogeeks] How to read multiple utf-8 encoded string from stdin

2013-11-26 Thread Pradeep Dubey
is this what you are asking:

#includestdio.h
#includestring.h

/* MASK to check the numebr of BYTES character is using */
#define ASCII_BYTE 0x80
#define TWO_BYTE 0xC0
#define THREE_BYTES 0xE0
#define FOUR_BYTES 0xF0
#define FIVE_BYTES 0xF8
#define SIX_BYTES 0xFC

#define MASK_BYTE 0xFF
#define MAX_BUFF 200

int non_ascii_count(char arr[]){
   unsigned int  non_ascii = 0, count = 0,num=0;
   char *ch = arr;
   while (*ch != '\0') {
   num = (unsigned int)(*ch) ;
/* Only one last byte of the uint is required */
num = num  MASK_BYTE;
/* Check for multi-byte only if its not an ASCII, val  128 */
   if (num  ASCII_BYTE ) {
/* Is a Non ASCII */
count = 0;
if (num  TWO_BYTE) {
count = 2;
} else if (num  THREE_BYTES) {
count = 3;
} else if (num  FOUR_BYTES) {
count = 4;
} else if (num  FIVE_BYTES) {
count = 5;
} else if (num  SIX_BYTES) {
count = 6;
}
/* Increment nonascii count and char pointer accordingly */
non_ascii++;
ch+=count;
}
/* ASCII , increment by one only */
   ch++;
   }
return non_ascii;
}

int main(void)
{
FILE* fd = stdin;
char buff[MAX_BUFF + 2]; /* 2 Extra for \0  \n */
memset(buff,0,sizeof(buff));
/* fgets reads max one less than provided length so adding 1 */
while (NULL != fgets(buff,MAX_BUFF+1,fd))
printf(%d\n, non_ascii_count(buff));
return 0;
}


On Tue, Nov 26, 2013 at 7:43 PM, Nishant Pandey 
nishant.bits.me...@gmail.com wrote:

 this way only helps in linux but when i use in windows with utf-8 encoded
 input file for reading characters i cant do it , secondly how to count non
 ascii characters from utf-8 string , any one is having any idea on this ?


 On Mon, Nov 25, 2013 at 11:50 AM, Karthikeyan V.B kartmu...@gmail.comwrote:



   From StackOverflow,

 ---

 fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and
 up. Change your code like this:


 infile = fopen(inname, r, ccs=UTF-8);



 On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey 
 nishant.bits.me...@gmail.com wrote:

 Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1
 string per line), count all *non-ascii* characters (ascii characters
 are with ordinal decimal 0 to 127) and print the total non-ascii character
 count to STDOUT (1 number per line).

 Contraint :


- You cannot use any *wchar.h* service in your program.
- The UTF-8 strings supplied to you can have *1 or more whitespaces* in
them.
- No input string will have a character length greater than*200 
 *(including
spaces)
- You will be given multiple lines of input (1 string per line).
- Input will be limited to UTF-8 encoded strings and will not
contain any garbage values.

  --
 You received this message because you are subscribed to the Google
 Groups Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to algogeeks+unsubscr...@googlegroups.com.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to algogeeks+unsubscr...@googlegroups.com.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to algogeeks+unsubscr...@googlegroups.com.




-- 
Regards,
Pradeep

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to algogeeks+unsubscr...@googlegroups.com.


Re: [algogeeks] How to read multiple utf-8 encoded string from stdin

2013-11-26 Thread Nishant Pandey
The Code is Awsome pradeep except few  things :
1) the output is coming wrong in 2 cases : a) x√ab c  counting non-ascii as
2 it should be 1
  b)ɖ  Ɛ
counting non-ascii as 4 it should be 2.


On Tue, Nov 26, 2013 at 7:57 PM, Pradeep Dubey pradeep.d...@gmail.comwrote:

 is this what you are asking:

 #includestdio.h
 #includestring.h

 /* MASK to check the numebr of BYTES character is using */
 #define ASCII_BYTE 0x80
 #define TWO_BYTE 0xC0
 #define THREE_BYTES 0xE0
 #define FOUR_BYTES 0xF0
 #define FIVE_BYTES 0xF8
 #define SIX_BYTES 0xFC

 #define MASK_BYTE 0xFF
 #define MAX_BUFF 200

 int non_ascii_count(char arr[]){
unsigned int  non_ascii = 0, count = 0,num=0;
char *ch = arr;
while (*ch != '\0') {
num = (unsigned int)(*ch) ;
 /* Only one last byte of the uint is required */
 num = num  MASK_BYTE;
 /* Check for multi-byte only if its not an ASCII, val  128 */
if (num  ASCII_BYTE ) {
 /* Is a Non ASCII */
 count = 0;
 if (num  TWO_BYTE) {
 count = 2;
 } else if (num  THREE_BYTES) {
 count = 3;
 } else if (num  FOUR_BYTES) {
 count = 4;
 } else if (num  FIVE_BYTES) {
 count = 5;
 } else if (num  SIX_BYTES) {
 count = 6;
 }
 /* Increment nonascii count and char pointer accordingly */
 non_ascii++;
 ch+=count;
 }
 /* ASCII , increment by one only */
ch++;
}
 return non_ascii;
 }

 int main(void)
 {
 FILE* fd = stdin;
 char buff[MAX_BUFF + 2]; /* 2 Extra for \0  \n */
 memset(buff,0,sizeof(buff));
 /* fgets reads max one less than provided length so adding 1 */
 while (NULL != fgets(buff,MAX_BUFF+1,fd))
 printf(%d\n, non_ascii_count(buff));
 return 0;
 }


 On Tue, Nov 26, 2013 at 7:43 PM, Nishant Pandey 
 nishant.bits.me...@gmail.com wrote:

 this way only helps in linux but when i use in windows with utf-8 encoded
 input file for reading characters i cant do it , secondly how to count non
 ascii characters from utf-8 string , any one is having any idea on this ?


 On Mon, Nov 25, 2013 at 11:50 AM, Karthikeyan V.B kartmu...@gmail.comwrote:



   From StackOverflow,

 ---

 fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and
 up. Change your code like this:


 infile = fopen(inname, r, ccs=UTF-8);



 On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey 
 nishant.bits.me...@gmail.com wrote:

 Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1
 string per line), count all *non-ascii* characters (ascii characters
 are with ordinal decimal 0 to 127) and print the total non-ascii character
 count to STDOUT (1 number per line).

 Contraint :


- You cannot use any *wchar.h* service in your program.
- The UTF-8 strings supplied to you can have *1 or more whitespaces* in
them.
- No input string will have a character length greater than*200 
 *(including
spaces)
- You will be given multiple lines of input (1 string per line).
- Input will be limited to UTF-8 encoded strings and will not
contain any garbage values.

  --
 You received this message because you are subscribed to the Google
 Groups Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to algogeeks+unsubscr...@googlegroups.com.


  --
 You received this message because you are subscribed to the Google
 Groups Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to algogeeks+unsubscr...@googlegroups.com.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to algogeeks+unsubscr...@googlegroups.com.




 --
 Regards,
 Pradeep

 --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to algogeeks+unsubscr...@googlegroups.com.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to algogeeks+unsubscr...@googlegroups.com.


Re: [algogeeks] How to read multiple utf-8 encoded string from stdin

2013-11-25 Thread Nishant Pandey
How about counting non ASCII from it ... Checking mere ASCII doesn't help
as utf8 it self  is combination of characters ...
On Nov 25, 2013 11:50 AM, Karthikeyan V.B kartmu...@gmail.com wrote:



   From StackOverflow,

 ---

 fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and
 up. Change your code like this:


 infile = fopen(inname, r, ccs=UTF-8);



 On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey 
 nishant.bits.me...@gmail.com wrote:

 Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1
 string per line), count all *non-ascii* characters (ascii characters are
 with ordinal decimal 0 to 127) and print the total non-ascii character
 count to STDOUT (1 number per line).

 Contraint :


- You cannot use any *wchar.h* service in your program.
- The UTF-8 strings supplied to you can have *1 or more whitespaces* in
them.
- No input string will have a character length greater than*200 
 *(including
spaces)
- You will be given multiple lines of input (1 string per line).
- Input will be limited to UTF-8 encoded strings and will not contain
any garbage values.

  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to algogeeks+unsubscr...@googlegroups.com.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to algogeeks+unsubscr...@googlegroups.com.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to algogeeks+unsubscr...@googlegroups.com.


Re: [algogeeks] How to read multiple utf-8 encoded string from stdin

2013-11-24 Thread Karthikeyan V.B
  From StackOverflow,

---

fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and
up. Change your code like this:


infile = fopen(inname, r, ccs=UTF-8);



On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey 
nishant.bits.me...@gmail.com wrote:

 Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1
 string per line), count all *non-ascii* characters (ascii characters are
 with ordinal decimal 0 to 127) and print the total non-ascii character
 count to STDOUT (1 number per line).

 Contraint :


- You cannot use any *wchar.h* service in your program.
- The UTF-8 strings supplied to you can have *1 or more whitespaces* in
them.
- No input string will have a character length greater than*200 *(including
spaces)
- You will be given multiple lines of input (1 string per line).
- Input will be limited to UTF-8 encoded strings and will not contain
any garbage values.

  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to algogeeks+unsubscr...@googlegroups.com.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to algogeeks+unsubscr...@googlegroups.com.


[algogeeks] How to read multiple utf-8 encoded string from stdin

2013-11-23 Thread Nishant Pandey
Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1
string per line), count all *non-ascii* characters (ascii characters are
with ordinal decimal 0 to 127) and print the total non-ascii character
count to STDOUT (1 number per line).

Contraint :


   - You cannot use any *wchar.h* service in your program.
   - The UTF-8 strings supplied to you can have *1 or more whitespaces* in
   them.
   - No input string will have a character length greater than*200 *(including
   spaces)
   - You will be given multiple lines of input (1 string per line).
   - Input will be limited to UTF-8 encoded strings and will not contain
   any garbage values.

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to algogeeks+unsubscr...@googlegroups.com.