[Corpora-List] Re: Any literature about tensors-based corpora NLP research with actual examples (and homework ;-)) you would suggest? ...

Anil Singh via Corpora Fri, 04 Aug 2023 10:51:46 -0700

I have been enjoying the discussion. I hope it will continue. I have learnt
some new things. I was also confused about the tensor thing, although not
in the same way.

I hope I am not among one of the scare quoted NLP practitioners, because
that's exactly what I like to call myself. I certainly don't think I am
qualified to work on language just because I can speak one.

I am currently reading your thesis and trying to digest it.

I also glanced through the syllabus you are preparing. I share your
interest in text encodings. among other things. I can't resist talking
about text encodings, whether I am teaching NLP or Computer Programming,
because I know first hand the problems in doing NLP for low resource
languages which are related to text encodings.

If you can actually teach that syllabus, I envy you as I am unable to get
people interested in the very basics of language/linguistics.

About the importance of granularities, I had, in my (very badly written)
PhD thesis, explicitly talked about NLP problem formulation in terms of
granularities. In my second research paper, I had used byte n-grams for
language identification. I use byte n-grams whenever I can. Actually, I
used it for language-encoding pair identification, as there are so many
non-standard 'encodings' which were used and perhaps are still used for
South Asian languages. My very first -- unsuccessful or you may say
unfinished -- attempt at doing some kind of NLP even before knowing that a
field called NLP or CL existed, was on building an encoding converter that
will work for all 'encodings' used for Indian languages. I too wish there
was a good comprehensive history text encodings, including non-standard
ad-hoc encodings.

I also share your interest in word level language identification. In 2007 I
had published one of the earliest papers on what I called language
identification in a multilingual document, where I had tried word level
language identification, and what is now called language identification for
code switched data.

About gender, I had actually made a kind of category assumption. I didn't
pay attention to the name, which you share with no less than Ada Byron.

We have to be tolerant of what you call bad research for various
unavoidable reasons. Research is not what it used to be. At least that's my
opinion. Still, in some ways it is better, perhaps like in the case of
gender representation.

About grammar, I have come to think of it as a kind of language model for
describing some linguistic phenomenon. I once received a review in which
the reviewer mentioned some grammatical mistakes and wrote that you don't
have to just see how the sentence/phrase sounds, you have to explicitly
check the grammar according to the rules. Thank you very much, but I learnt
English without paying any explicit attention to grammar. I am pretty sure
I didn't learn much from explicit teaching of grammar, whether of English,
or of Sanskrit, or of French.That doesn't necessarily mean I don't believe
in grammar, but I guess I am moving towards the language games view of
language.

As to language being magical, well, that depends on what you mean by
magical. To me, it seems it is magical in the same sense as life itself is
magical. Nothing more, nothing less. Even computer programming I have been
known to call magical in a certain sense.

I also completely agree that we can only hope that we are communicating as
we intended, but we rarely, if ever, actually attain that goal.

I can't match your background, but I did have -- what can be called -- four
rounds of graduate training in different disciplines. I am still trying to
learn new things about language. However, I have no experience of field
work at all and that I regret, but it is partly because I am not a social
creature, or, to be more precise (as if one can be precise with language),
I am socially totally incompetent. I wouldn't know how to approach anyone
for fieldwork in Linguistics.

On Fri, Aug 4, 2023 at 9:03 PM Ada Wan via Corpora <corpora@list.elra.info>
wrote:

> @Toms:
> for completeness' sake: would you mind please sharing your background?
> Thanks.
>
> On Fri, Aug 4, 2023 at 5:31 PM Ada Wan <adawan...@gmail.com> wrote:
>
>> Thanks x2, Ibrtchx.
>>
>> On Fri, Aug 4, 2023 at 3:30 AM Albretch Mueller <lbrt...@gmail.com>
>> wrote:
>>
>>> On 8/3/23, Toms Bergmanis <toms.bergma...@tilde.lv> wrote:
>>>  ...
>>>
>>>  I, for one, have benefited from Ada's, as well as other member's
>>> suggestions and comments as I hope they have somehow benefited from
>>> mine.
>>>  lbrtchx
>>>
>> _______________________________________________
> Corpora mailing list -- corpora@list.elra.info
> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> To unsubscribe send an email to corpora-le...@list.elra.info
>

-- 
- Anil

_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-le...@list.elra.info

[Corpora-List] Re: Any literature about tensors-based corpora NLP research with actual examples (and homework ;-)) you would suggest? ...

Reply via email to